[jira] [Commented] (BEAM-10419) Flaky FhirIOWriteIT integraion test in java postcommit

Jacob Ferriero (Jira) Tue, 07 Jul 2020 18:08:50 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153159#comment-17153159
 ]


Jacob Ferriero commented on BEAM-10419:
---------------------------------------

Seems that the logic that moves files to "claim" them for a batch import in 
FhirIO.Import sometimes is trying to copy a file that has already been deleted.

For sake of explanation (and refreshing my own memory) the FhirIO.Import works 
as follows:
# Buffer elements in a bundle to a new line delimited JSON file in the tempDir
# Group these files into batches (of 10,000 files) this is to avoid many load 
jobs if batches were small
# Copy each batch of files to a unique sub temp dir lets call it a batchDir 
(this is used to specify a prefix + wildcard for each load job) in 
[ImportFn|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1033-L1040]
# Start load job and block til completion (deletes the files in the batchDir 
[here|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1051]
 and the corresponding files in the orignal dir 
[here|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1065])
# Delete all the (original) files in the tempDir once the window is closed 
(based on this [usage of 
Wait.on|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L840-L870])

Background on why this logic was so complicated in this review 
[thread|https://github.com/apache/beam/pull/11339#discussion_r423357007]

The flakiness seems to occur when attempting the copy in step 3 on a file that 
does not exists. My hunch is that the deleting in step 5 may not be properly 
waiting on the ImportFn to be complete.

Could this happen if the ImportFn task is rescheduled for a particular batch?

FYI I do not have bandwidth to dig deeper into this / fix this right now and 
[~lastomato] has taken over maintenance of these IO connectors as I am not 
currently engaged w/ Healthcare customers.

{noformat}
Caused by: java.io.IOException: Error executing batch GCS request
        at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.executeBatches(GcsUtil.java:617)
        at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.copy(GcsUtil.java:718)
        at 
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.copy(GcsFileSystem.java:168)
        at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:289)
        at 
org.apache.beam.sdk.io.gcp.healthcare.FhirIO$Import$ImportFn.importBatch(FhirIO.java:1040)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error 
trying to rewrite 
gs://temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson
 to 
gs://temp-storage-for-healthcare-io-tests/tmp-22b3fd94-1455-42b0-b24a-55439eff5647/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson:
 {"code":404,"errors":[{"domain":"global","message":"No such object: 
temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson","reason":"notFound"}],"message":"No
 such object: 
temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson"}
{noformat}


> Flaky FhirIOWriteIT integraion test in java postcommit
> ------------------------------------------------------
>
>                 Key: BEAM-10419
>                 URL: https://issues.apache.org/jira/browse/BEAM-10419
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Yichi Zhang
>            Assignee: Jacob Ferriero
>            Priority: P1
>              Labels: flake
>
> See history 
> https://ci-beam.apache.org/job/beam_PostCommit_Java/6320/testReport/junit/org.apache.beam.sdk.io.gcp.healthcare/FhirIOWriteIT/history/?start=25



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10419) Flaky FhirIOWriteIT integraion test in java postcommit

Reply via email to