[GitHub] [beam] damccorm opened a new issue, #20585: FileIO azfs Stream mark expired

GitBox Sat, 04 Jun 2022 11:15:06 -0700


damccorm opened a new issue, #20585:
URL: https://github.com/apache/beam/issues/20585


   I am attempting to parse a very large CSV (65 million lines) with BEAM 
(version 2.25) from an Azure Blob and have created a pipeline for this. I am 
running the pipeline on dataflow and testing with a smaller version of the file 
(10'000 lines).
   
   I am using FileIO and the filesystem prefix "azfs" to read from azure blobs.
   
   The pipeline works with the small test file, but when I run this on the 
bigger file I am getting an exception "Stream Mark Expired" (pasted below). 
Reading the same file from a GCP bucket works just fine, including when running 
with dataflow. 
   
   The CSV file I am attempting to ingest is 54.2 GB and can be obtained here: 
https://obis.org/manual/access/
   
   Imported from Jira 
[BEAM-11313](https://issues.apache.org/jira/browse/BEAM-11313). Original Jira 
may contain additional context.
   Reported by: [email protected].


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #20585: FileIO azfs Stream mark expired

Reply via email to