Thomas Li Fredriksen created BEAM-11313:
-------------------------------------------

             Summary: FileIO azfs Stream mark expired
                 Key: BEAM-11313
                 URL: https://issues.apache.org/jira/browse/BEAM-11313
             Project: Beam
          Issue Type: Bug
          Components: io-java-azure, runner-dataflow
    Affects Versions: 2.25.0
         Environment: Beam v2.25
Google Dataflow runner v2.25
            Reporter: Thomas Li Fredriksen


I am attempting to parse a very large CSV (65 million lines) with BEAM (version 
2.25) from an Azure Blob and have created a pipeline for this. I am running the 
pipeline on dataflow and testing with a smaller version of the file (10'000 
lines).

I am using FileIO and the filesystem prefix "azfs" to read from azure blobs.

The pipeline works with the small test file, but when I run this on the bigger 
file I am getting an exception "Stream Mark Expired" (pasted below). Reading 
the same file from a GCP bucket works just fine, including when running with 
dataflow. 

The CSV file I am attempting to ingest is 54.2 GB and can be obtained here: 
https://obis.org/manual/access/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to