[ 
https://issues.apache.org/jira/browse/BEAM-12533?focusedWorklogId=614783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614783
 ]

ASF GitHub Bot logged work on BEAM-12533:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Jun/21 00:25
            Start Date: 25/Jun/21 00:25
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on a change in pull request 
#15072:
URL: https://github.com/apache/beam/pull/15072#discussion_r658370349



##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1843,9 +1873,69 @@ def repeat(self, repeats, axis):
           f"DeferredSeries (encountered {type(repeats)}).")
 
 
+def _justify_str_column(objs, rjust=True):
+  strs = [str(o) for o in objs]
+  maxlen = max(len(s) for s in strs)
+  return [s.rjust(maxlen) if rjust else s.ljust(maxlen) for s in strs]
+
+
+def _ljustify_str_column(objs):
+  strs = [str(o) for o in objs]
+  maxlen = max(len(s) for s in strs)
+  return [s.ljust(maxlen) for s in strs]
+
+
+def _justify_columns_and_transpose(columns, rjust=True):
+  for row in zip(*[_justify_str_column(objs, rjust) for objs in columns]):
+    yield ' '.join(row)
+
+
 @populate_not_implemented(pd.DataFrame)
 @frame_base.DeferredFrame._register_for(pd.DataFrame)
 class DeferredDataFrame(DeferredDataFrameOrSeries):

Review comment:
       DataFrames are just concatenated Series, but they also have a common, 
shared index, so that logic will only need to happen once for the DataFrame 
case.
   
   I tried to share as much code as possible by pulling out the justification 
logic. It's probably possible to pull out some common logic for rendering the 
index though, I'll see if I can come up with something clean there.
   
   WDYT about the general approach of using ":" in the columns and "??" for the 
length to indicate this is a deferred object?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 614783)
    Time Spent: 1h 20m  (was: 1h 10m)

> DeferedSeries and DeferredDataFrame should have a useful repr
> -------------------------------------------------------------
>
>                 Key: BEAM-12533
>                 URL: https://issues.apache.org/jira/browse/BEAM-12533
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>             Fix For: 2.32.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> DeferredSeries and DeferredDataFrame just use the default __repr__ 
> implementation right now, which means outputting them in a notebook is not 
> useful at all. Users will need to inspect columns, dtypes, index, name, etc.. 
> manually. We should include basic information about the frames in a simple 
> __repr__ implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to