[
https://issues.apache.org/jira/browse/BEAM-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153159#comment-17153159
]
Jacob Ferriero commented on BEAM-10419:
---------------------------------------
Seems that the logic that moves files to "claim" them for a batch import in
FhirIO.Import sometimes is trying to copy a file that has already been deleted.
For sake of explanation (and refreshing my own memory) the FhirIO.Import works
as follows:
# Buffer elements in a bundle to a new line delimited JSON file in the tempDir
# Group these files into batches (of 10,000 files) this is to avoid many load
jobs if batches were small
# Copy each batch of files to a unique sub temp dir lets call it a batchDir
(this is used to specify a prefix + wildcard for each load job) in
[ImportFn|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1033-L1040]
# Start load job and block til completion (deletes the files in the batchDir
[here|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1051]
and the corresponding files in the orignal dir
[here|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1065])
# Delete all the (original) files in the tempDir once the window is closed
(based on this [usage of
Wait.on|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L840-L870])
Background on why this logic was so complicated in this review
[thread|https://github.com/apache/beam/pull/11339#discussion_r423357007]
The flakiness seems to occur when attempting the copy in step 3 on a file that
does not exists. My hunch is that the deleting in step 5 may not be properly
waiting on the ImportFn to be complete.
Could this happen if the ImportFn task is rescheduled for a particular batch?
FYI I do not have bandwidth to dig deeper into this / fix this right now and
[~lastomato] has taken over maintenance of these IO connectors as I am not
currently engaged w/ Healthcare customers.
{noformat}
Caused by: java.io.IOException: Error executing batch GCS request
at
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.executeBatches(GcsUtil.java:617)
at
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.copy(GcsUtil.java:718)
at
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.copy(GcsFileSystem.java:168)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:289)
at
org.apache.beam.sdk.io.gcp.healthcare.FhirIO$Import$ImportFn.importBatch(FhirIO.java:1040)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error
trying to rewrite
gs://temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson
to
gs://temp-storage-for-healthcare-io-tests/tmp-22b3fd94-1455-42b0-b24a-55439eff5647/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson:
{"code":404,"errors":[{"domain":"global","message":"No such object:
temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson","reason":"notFound"}],"message":"No
such object:
temp-storage-for-healthcare-io-tests/fhirImportBatch-60962352-ef3c-4d42-8079-c17ac430ebd6.ndjson"}
{noformat}
> Flaky FhirIOWriteIT integraion test in java postcommit
> ------------------------------------------------------
>
> Key: BEAM-10419
> URL: https://issues.apache.org/jira/browse/BEAM-10419
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Yichi Zhang
> Assignee: Jacob Ferriero
> Priority: P1
> Labels: flake
>
> See history
> https://ci-beam.apache.org/job/beam_PostCommit_Java/6320/testReport/junit/org.apache.beam.sdk.io.gcp.healthcare/FhirIOWriteIT/history/?start=25
--
This message was sent by Atlassian Jira
(v8.3.4#803005)