[ https://issues.apache.org/jira/browse/BEAM-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anonymous updated BEAM-12701: ----------------------------- Status: Triage Needed (was: Resolved) > Converting two deferred dataframes to csv in the same pipeline causes a > PCollection label collision > ---------------------------------------------------------------------------------------------------- > > Key: BEAM-12701 > URL: https://issues.apache.org/jira/browse/BEAM-12701 > Project: Beam > Issue Type: Bug > Components: dsl-dataframe > Affects Versions: 2.31.0, 2.32.0, 2.33.0 > Reporter: Jérémie Bigras-Dunberry > Assignee: Eduardo Sánchez López > Priority: P2 > Fix For: 2.34.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > > If you use the to_csv of the DeferredDataFrame twice in a single pipeline > like this : > {code:java} > df1 = pd.DataFrame.from_records({"a":"b"}, index=[0]) > df2 = pd.DataFrame.from_records({"a":"b"}, index=[0]) > with beam.Pipeline() as p: > df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1") > df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2") > df1.to_csv("test.csv") > df2.to_csv("test2.csv"){code} > You get this error on the second to_csv call > > {code:java} > RuntimeError: A transform with label "ToPCollection(df)" already exists in > the pipeline. To apply a transform with a specified label write pvalue | > "label" >> transform > {code} > I think it comes from the fact that to_csv is calling a to_pcollection > without any label, causing to infer an identical label for both to_csv > function calls. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)