tmp-27400e24c0c31bc1-00000-of-00001.avro

Beam JIRA Bot (Jira) Sat, 05 Jun 2021 10:20:09 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-12101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357923#comment-17357923
 ]


Beam JIRA Bot commented on BEAM-12101:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: 
> gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-12101
>                 URL: https://issues.apache.org/jira/browse/BEAM-12101
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-avro
>    Affects Versions: 2.28.0
>         Environment: google cloud platform. 
> Kicking off the job locally from WSL ubuntu 20.0 
> python version 3.8.5 
>            Reporter: Patrick Linnane
>            Priority: P2
>              Labels: stale-P2
>             Fix For: Not applicable
>
>         Attachments: downloaded-logs-20210406-112843.json
>
>
> I am processing up to a 1000 files .......xml.gz
> When I run a sample of 128 256, and 512 it works but not always.
> I have used between 8 and 512 workers. It seems anytime the job runs for 
> longer then 30 minutes the job fails with FileNotFoundError: errot related to 
> fastavro. 
> {code:python}
>         lines = (
>                 p1
>                 | "Get name" >> beam.Create(names[(no_of_files * (i - 1)) // 
> no_of_jobs: (no_of_files * i) // no_of_jobs])
>                 | "Read from cloud" >> beam.ParDo(ReadGCS())
>                 | "Parse into JSON" >> beam.ParDo(ParseXML())
>                 | "Get Medline" >> beam.ParDo(GetMedline())
>                 | "Build Json" >> beam.ParDo(JsonBuilder())
>                 | "Write elements" >> beam.io.WriteToBigQuery(table=table_ref,
>                                                               
> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>                                                               
> write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>                                                               
> schema="SCHEMA_AUTODETECT",
>                                                               
> insert_retry_strategy=RetryStrategy.RETRY_ALWAYS,
>                                                               
> ignore_insert_ids=True, validate=False)
>         )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-12101) Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro

Reply via email to