[ 
https://issues.apache.org/jira/browse/BEAM-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3681:
-------------------------------
    Description: 
When executing a simple write on S3 with the direct runner. It breaks sometimes 
when it ends up trying to write 'empty' shards to S3.
{code:java}
Pipeline pipeline = Pipeline.create(options);
pipeline
 .apply("CreateSomeData", Create.of("1", "2", "3"))
 .apply("WriteToFS", TextIO.write().to(options.getOutput()));
pipeline.run();{code}
The related exception is:
{code:java}
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not 
well-formed or did not validate against our published schema (Service: Amazon 
S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342)
    at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at org.apache.beam.samples.ingest.amazon.IngestToS3.main(IngestToS3.java:82)
Caused by: java.io.IOException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not 
well-formed or did not validate against our published schema (Service: Amazon 
S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:563)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
    at 
org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
    at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you 
provided was not well-formed or did not validate against our published schema 
(Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 
402E99C2F602AD09; S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
    at 
com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3065)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:561)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
    at 
org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
    at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748){code}
 

 

 

  was:
When executing a simple write on S3 with the direct runner. It seems to break 
when trying to write an 'empty' shard to S3.
{code:java}
Pipeline pipeline = Pipeline.create(options);
 pipeline
 .apply("CreateSomeData", Create.of("1", "2", "3"))
.apply("WriteToFS", TextIO.write().to(options.getOutput()));
 pipeline.run();{code}
The related exception is:
{code:java}
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not 
well-formed or did not validate against our published schema (Service: Amazon 
S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342)
    at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at org.apache.beam.samples.ingest.amazon.IngestToS3.main(IngestToS3.java:82)
Caused by: java.io.IOException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not 
well-formed or did not validate against our published schema (Service: Amazon 
S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:563)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
    at 
org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
    at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you 
provided was not well-formed or did not validate against our published schema 
(Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 
402E99C2F602AD09; S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), 
S3 Extended Request ID: 
SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
    at 
com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3065)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:561)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
    at 
org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
    at 
org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
    at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748){code}
 

 

 


> Amazon S3 write breaks randomly
> -------------------------------
>
>                 Key: BEAM-3681
>                 URL: https://issues.apache.org/jira/browse/BEAM-3681
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>    Affects Versions: 2.3.0, 2.4.0
>            Reporter: Ismaël Mejía
>            Priority: Major
>
> When executing a simple write on S3 with the direct runner. It breaks 
> sometimes when it ends up trying to write 'empty' shards to S3.
> {code:java}
> Pipeline pipeline = Pipeline.create(options);
> pipeline
>  .apply("CreateSomeData", Create.of("1", "2", "3"))
>  .apply("WriteToFS", TextIO.write().to(options.getOutput()));
> pipeline.run();{code}
> The related exception is:
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: 
> com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was 
> not well-formed or did not validate against our published schema (Service: 
> Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 
> 402E99C2F602AD09; S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=),
>  S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342)
>     at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206)
>     at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>     at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>     at 
> org.apache.beam.samples.ingest.amazon.IngestToS3.main(IngestToS3.java:82)
> Caused by: java.io.IOException: 
> com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was 
> not well-formed or did not validate against our published schema (Service: 
> Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 
> 402E99C2F602AD09; S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=),
>  S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
>     at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:563)
>     at 
> org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
>     at 
> org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
>     at 
> org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
>     at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you 
> provided was not well-formed or did not validate against our published schema 
> (Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 
> 402E99C2F602AD09; S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=),
>  S3 Extended Request ID: 
> SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>     at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>     at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>     at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
>     at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
>     at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3065)
>     at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:561)
>     at 
> org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495)
>     at 
> org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642)
>     at 
> org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100)
>     at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748){code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to