TheNeuralBit commented on a change in pull request #15450:
URL: https://github.com/apache/beam/pull/15450#discussion_r703642489



##########
File path: sdks/python/apache_beam/dataframe/io.py
##########
@@ -74,16 +74,17 @@ def read_csv(path, *args, splittable=False, **kwargs):
       splitter=_CsvSplitter(args, kwargs) if splittable else None)
 
 
-def _as_pc(df):
+def _as_pc(df, label=None):
   from apache_beam.dataframe import convert  # avoid circular import
   # TODO(roberwb): Amortize the computation for multiple writes?
-  return convert.to_pcollection(df, yield_elements='pandas')
+  return convert.to_pcollection(df, yield_elements='pandas', label=label)
 
 
 @frame_base.with_docs_from(pd.DataFrame)
-def to_csv(df, path, *args, **kwargs):
-
-  return _as_pc(df) | _WriteToPandas(
+def to_csv(df, path, transform_label=None, *args, **kwargs):
+  label_pc = f"{transform_label} - ToPCollection" if transform_label else 
"ToPCollection(df)"
+  label_pd = f"{transform_label} - ToPandasDataFrame" if transform_label else 
"ToPandasDataFrame(df)"

Review comment:
       I think it's good to give users the option to override the transform 
label here, but I think we should also make this have a sensible default that 
is likely to be unique, so that most users won't have to override it. Maybe we 
could make the default include the path?
   
   Something like: 
   
   ```suggestion
     label_pc = f"{transform_label} - ToPCollection" if transform_label else 
"{path} - ToPCollection(df)"
     label_pd = f"{transform_label} - WriteToPandas" if transform_label else 
"{path} - WriteToPandas(df)"
   ```
   
   also note the nit about `label_pd`, I think it should be WriteToPandas. What 
do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to