damccorm opened a new issue, #20854:
URL: https://github.com/apache/beam/issues/20854

   I am processing up to a 1000 files .......xml.gz
   When I run a sample of 128 256, and 512 it works but not always.
   I have used between 8 and 512 workers. It seems anytime the job runs for 
longer then 30 minutes the job fails with FileNotFoundError: errot related to 
fastavro. 
   ```
   
           lines = (
                   p1
                   | "Get name" >> beam.Create(names[(no_of_files
   * (i - 1)) // no_of_jobs: (no_of_files * i) // no_of_jobs])
                   | "Read from cloud" >>
   beam.ParDo(ReadGCS())
                   | "Parse into JSON" >> beam.ParDo(ParseXML())
                
     | "Get Medline" >> beam.ParDo(GetMedline())
                   | "Build Json" >> beam.ParDo(JsonBuilder())
   
                  | "Write elements" >> beam.io.WriteToBigQuery(table=table_ref,
                       
                                            
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
   
                                                                
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
   
                                                                
schema="SCHEMA_AUTODETECT",
            
                                                       
insert_retry_strategy=RetryStrategy.RETRY_ALWAYS,
   
                                                                
ignore_insert_ids=True, validate=False)
   
          )
   
   ```
   
   
   
   Imported from Jira 
[BEAM-12101](https://issues.apache.org/jira/browse/BEAM-12101). Original Jira 
may contain additional context.
   Reported by: xct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to