> It seems that the schema changes to arrow is a custom solution for just 
> Perspective and it might be prudent to wait for Arrow 4 that will have a 
> standard way of representing this information.

Arrow 4.0.0 is not going to have the pivot table structures you are
looking for (speaking as the one who originally implemented the pandas
pivot table stuff — I haven't seen anyone building anything like that
here).

In principle I don't see a problem with implementing a pivot table
abstraction somewhere in Apache Arrow and arranging to transport the
data as an Arrow record batch along with the additional metadata that
would be required for a consumer to understand the pivot table
structure without expensive analysis. This wouldn't be something
formalized in the columnar format, of course, just a convenient add on
application.

The query engine work we're starting up in C++ could certainly support
the manipulation of pivot tables and OLAP cubes at some point, but I
am not sure if it is a priority relative to other things right now.

On Fri, Mar 19, 2021 at 8:53 AM Michael Lavina
<michael.lav...@factset.com> wrote:
>
> Hey Tim,
>
> Maybe you can shed some light on this for me. Again sorry if this is well 
> know but I just found out about perspective and I have been playing around 
> with it.
>
> Is the thought that the output of to_arrow() should not be used in non 
> perspective context? For my use case we are thinking of using perspective in 
> Jupyter notebooks to build up complex representation of tabular data 
> involving pivoting and filtering. And we would take that output of to_arrow 
> and feed into some of our own rendering engines. It did seem like the arrow 
> output changed when I did pivoting, but I could not figure out how or why, 
> haha.
>
> It seems that the schema changes to arrow is a custom solution for just 
> Perspective and it might be prudent to wait for Arrow 4 that will have a 
> standard way of representing this information.
>
> -Michael
>
> From: Tim Paine <t.paine...@gmail.com>
> Date: Friday, March 19, 2021 at 9:53 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: [DISCUSS] How to encode table_pivot information state in Arrow
> Perspective uses arrow across the wire but internally uses  it's own formats.
>
> Tim Paine
> tim.paine.nyc
> 908-721-1185
>
> > On Mar 19, 2021, at 09:46, Michael Lavina <michael.lav...@factset.com> 
> > wrote:
> >
> > Hey Benjamin,
> >
> > That sounds really awesome. Thank you.
> >
> > Sorry if this was already a well known thing as I am fairly new to the 
> > Arrow ecosystem. Is there a way to track a roadmap for Arrow 4 and be 
> > involved in that? Is there anywhere I can read more just general 
> > information on that?
> >
> > -Michael
> >
> > From: Benjamin Kietzman <bengil...@gmail.com>
> > Date: Friday, March 19, 2021 at 9:14 AM
> > To: dev <dev@arrow.apache.org>
> > Subject: Re: [DISCUSS] How to encode table_pivot information state in Arrow
> > Hi Michael,
> >
> > We are targeting grouped aggregation for 4.0 as part of a general query
> > engine buildout. We also intend to bring DataFrame functionality into core
> > Arrow (which would probably include an analog of pandas' pivot_table), but
> > the query engine work is a prerequisite.
> >
> > Ben Kietzman
> >
> >> On Fri, Mar 19, 2021, 08:19 Michael Lavina <michael.lav...@factset.com>
> >> wrote:
> >>
> >> Hey Team,
> >>
> >> Sorry if this is answered already somewhere I tried searching emails and
> >> issues but couldn’t find anything. I am wondering if there is a standard
> >> way to encode row or column pivots in Arrow?
> >>
> >> I know Pandas does it already some way
> >> https://urldefense.com/v3/__https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$<https://urldefense.com/v3/__https:/pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$<https://urldefense.com/v3/__https:/pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$%3chttps:/urldefense.com/v3/__https:/pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$>>
> >> and there are libraries using Arrow like Perspective that may have their
> >> internal solution for representation of pivots
> >> https://urldefense.com/v3/__https://perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$<https://urldefense.com/v3/__https:/perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$<https://urldefense.com/v3/__https:/perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$%3chttps:/urldefense.com/v3/__https:/perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$>>
> >>
> >> I am wondering if there is already a discussion or already a best practice
> >> or standard for encoding this information. Or alternatively is this not
> >> something that should be at all associated with Arrow.
> >>
> >> -Michael
> >>
> >> P.S. If anyone on the Perspective team or anyone who might know is on this
> >> thread I would be interested in understanding more how Perspective,
> >> specifically, encodes pivot information in Arrow.
> >>
> >>

Reply via email to