[
https://issues.apache.org/jira/browse/BEAM-12101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549247#comment-17549247
]
Danny McCormick commented on BEAM-12101:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/20854
> Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found:
> gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-12101
> URL: https://issues.apache.org/jira/browse/BEAM-12101
> Project: Beam
> Issue Type: Bug
> Components: io-py-avro
> Affects Versions: 2.28.0
> Environment: google cloud platform.
> Kicking off the job locally from WSL ubuntu 20.0
> python version 3.8.5
> Reporter: Patrick Linnane
> Priority: P3
> Fix For: Not applicable
>
> Attachments: downloaded-logs-20210406-112843.json
>
>
> I am processing up to a 1000 files .......xml.gz
> When I run a sample of 128 256, and 512 it works but not always.
> I have used between 8 and 512 workers. It seems anytime the job runs for
> longer then 30 minutes the job fails with FileNotFoundError: errot related to
> fastavro.
> {code:python}
> lines = (
> p1
> | "Get name" >> beam.Create(names[(no_of_files * (i - 1)) //
> no_of_jobs: (no_of_files * i) // no_of_jobs])
> | "Read from cloud" >> beam.ParDo(ReadGCS())
> | "Parse into JSON" >> beam.ParDo(ParseXML())
> | "Get Medline" >> beam.ParDo(GetMedline())
> | "Build Json" >> beam.ParDo(JsonBuilder())
> | "Write elements" >> beam.io.WriteToBigQuery(table=table_ref,
>
> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>
> write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>
> schema="SCHEMA_AUTODETECT",
>
> insert_retry_strategy=RetryStrategy.RETRY_ALWAYS,
>
> ignore_insert_ids=True, validate=False)
> )
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)