[ 
https://issues.apache.org/jira/browse/BEAM-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257056#comment-17257056
 ] 

Beam JIRA Bot commented on BEAM-11313:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> FileIO azfs Stream mark expired
> -------------------------------
>
>                 Key: BEAM-11313
>                 URL: https://issues.apache.org/jira/browse/BEAM-11313
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-azure, runner-dataflow
>    Affects Versions: 2.25.0
>         Environment: Beam v2.25
> Google Dataflow runner v2.25
>            Reporter: Thomas Li Fredriksen
>            Assignee: Pablo Estrada
>            Priority: P2
>              Labels: stale-assigned
>
> I am attempting to parse a very large CSV (65 million lines) with BEAM 
> (version 2.25) from an Azure Blob and have created a pipeline for this. I am 
> running the pipeline on dataflow and testing with a smaller version of the 
> file (10'000 lines).
> I am using FileIO and the filesystem prefix "azfs" to read from azure blobs.
> The pipeline works with the small test file, but when I run this on the 
> bigger file I am getting an exception "Stream Mark Expired" (pasted below). 
> Reading the same file from a GCP bucket works just fine, including when 
> running with dataflow. 
> The CSV file I am attempting to ingest is 54.2 GB and can be obtained here: 
> https://obis.org/manual/access/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to