Edusanc95 commented on a change in pull request #15450:
URL: https://github.com/apache/beam/pull/15450#discussion_r706633103
##########
File path: sdks/python/apache_beam/dataframe/io.py
##########
@@ -74,16 +74,17 @@ def read_csv(path, *args, splittable=False, **kwargs):
splitter=_CsvSplitter(args, kwargs) if splittable else None)
-def _as_pc(df):
+def _as_pc(df, label=None):
from apache_beam.dataframe import convert # avoid circular import
# TODO(roberwb): Amortize the computation for multiple writes?
- return convert.to_pcollection(df, yield_elements='pandas')
+ return convert.to_pcollection(df, yield_elements='pandas', label=label)
@frame_base.with_docs_from(pd.DataFrame)
-def to_csv(df, path, *args, **kwargs):
-
- return _as_pc(df) | _WriteToPandas(
+def to_csv(df, path, transform_label=None, *args, **kwargs):
+ label_pc = f"{transform_label} - ToPCollection" if transform_label else
"ToPCollection(df)"
+ label_pd = f"{transform_label} - ToPandasDataFrame" if transform_label else
"ToPandasDataFrame(df)"
Review comment:
Hello! I agree with your remarks. I just pushed a commit that includes
this change as well as the linter fix.
The message is slightly different, `WriteToPandas(df) - {path}` instead of
`{path} - WriteToPandas(df)`. I think when looking at a glance it makes more
sense to see first the operation that's being done and afterwards the specific
file that's being transformed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]