Re: Spark DataFrame UNPIVOT feature

Mike Hynes Wed, 22 Aug 2018 05:59:00 -0700

Hi Reynold/Ivan,

People familiar with pandas and R dataframes will likely have used the
dataframe "melt" idiom, which is the functionality I believe you are
referring to:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html

I have had to write this function myself in my own work in Spark SQL, as it
is a common step in data wrangling when you do not control the structure of
the input dataframes you are working with in your pipelines.

I would hence second Ivan that adding it as a native dataframe method would
no doubt be helpful (and for what it's worth, so would other concepts from
the pandas API, such as named indexing & multilevel indexing).

Cheers,
Mike

On Tue, Aug 21, 2018, 5:07 PM Reynold Xin, <r...@databricks.com> wrote:

> Probably just because it is not used that often and nobody has submitted a
> patch for it. I've used pivot probably on average once a week (primarily in
> spreadsheets), but I've never used unpivot ...
>
>
> On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali <i...@lecida.com> wrote:
>
>> Hi there,
>>
>> I was looking into why the UNPIVOT feature isn't implemented, given that
>> Spark already has PIVOT implemented natively in the DataFrame/Dataset API.
>>
>> Came across this JIRA <https://issues.apache.org/jira/browse/SPARK-8992> 
>> which
>> talks about implementing PIVOT in Spark 1.6, but no mention whatsoever
>> regarding UNPIVOT, even though the JIRA curiously references a blog post
>> that talks about both PIVOT and UNPIVOT :)
>>
>> Is this because UNPIVOT is just simply generating multiple slim tables by
>> selecting each column, and making a union out of all of them?
>>
>> Thank you!
>>
>> --
>> Regards,
>>
>>
>> Ivan Gozali
>> Lecida
>> Email: i...@lecida.com
>>
>

Re: Spark DataFrame UNPIVOT feature

Reply via email to