[ https://issues.apache.org/jira/browse/BEAM-12533?focusedWorklogId=614783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614783 ]
ASF GitHub Bot logged work on BEAM-12533: ----------------------------------------- Author: ASF GitHub Bot Created on: 25/Jun/21 00:25 Start Date: 25/Jun/21 00:25 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on a change in pull request #15072: URL: https://github.com/apache/beam/pull/15072#discussion_r658370349 ########## File path: sdks/python/apache_beam/dataframe/frames.py ########## @@ -1843,9 +1873,69 @@ def repeat(self, repeats, axis): f"DeferredSeries (encountered {type(repeats)}).") +def _justify_str_column(objs, rjust=True): + strs = [str(o) for o in objs] + maxlen = max(len(s) for s in strs) + return [s.rjust(maxlen) if rjust else s.ljust(maxlen) for s in strs] + + +def _ljustify_str_column(objs): + strs = [str(o) for o in objs] + maxlen = max(len(s) for s in strs) + return [s.ljust(maxlen) for s in strs] + + +def _justify_columns_and_transpose(columns, rjust=True): + for row in zip(*[_justify_str_column(objs, rjust) for objs in columns]): + yield ' '.join(row) + + @populate_not_implemented(pd.DataFrame) @frame_base.DeferredFrame._register_for(pd.DataFrame) class DeferredDataFrame(DeferredDataFrameOrSeries): Review comment: DataFrames are just concatenated Series, but they also have a common, shared index, so that logic will only need to happen once for the DataFrame case. I tried to share as much code as possible by pulling out the justification logic. It's probably possible to pull out some common logic for rendering the index though, I'll see if I can come up with something clean there. WDYT about the general approach of using ":" in the columns and "??" for the length to indicate this is a deferred object? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 614783) Time Spent: 1h 20m (was: 1h 10m) > DeferedSeries and DeferredDataFrame should have a useful repr > ------------------------------------------------------------- > > Key: BEAM-12533 > URL: https://issues.apache.org/jira/browse/BEAM-12533 > Project: Beam > Issue Type: Improvement > Components: dsl-dataframe > Reporter: Brian Hulette > Assignee: Brian Hulette > Priority: P2 > Fix For: 2.32.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > DeferredSeries and DeferredDataFrame just use the default __repr__ > implementation right now, which means outputting them in a notebook is not > useful at all. Users will need to inspect columns, dtypes, index, name, etc.. > manually. We should include basic information about the frames in a simple > __repr__ implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)