[ 
https://issues.apache.org/jira/browse/BEAM-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated BEAM-12701:
-----------------------------
    Status: Triage Needed  (was: Resolved)

> Converting two deferred dataframes  to csv in the same pipeline causes a 
> PCollection label collision
> ----------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-12701
>                 URL: https://issues.apache.org/jira/browse/BEAM-12701
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-dataframe
>    Affects Versions: 2.31.0, 2.32.0, 2.33.0
>            Reporter: Jérémie Bigras-Dunberry
>            Assignee: Eduardo Sánchez López
>            Priority: P2
>             Fix For: 2.34.0
>
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
>  
> If you use  the to_csv of the DeferredDataFrame twice in a single pipeline 
> like this : 
> {code:java}
> df1 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> df2 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> with beam.Pipeline() as p:
>  df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1")
>  df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2")
>  df1.to_csv("test.csv")
>  df2.to_csv("test2.csv"){code}
> You get this error on the second to_csv call
>  
> {code:java}
> RuntimeError: A transform with label "ToPCollection(df)" already exists in 
> the pipeline. To apply a transform with a specified label write pvalue | 
> "label" >> transform
> {code}
> I think it comes from the fact that to_csv  is calling a  to_pcollection 
> without any label, causing to infer an identical label for both to_csv 
> function calls. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to