[Spark Core]: Support for un-pivoting data ('melt')

Daniel Davies Sun, 02 Jan 2022 12:00:08 -0800

Level: Intermediate (I think?)
Scenario: Feature Request

Hello dev@,


(First time posting on this mailing list; apologies in advance if this
should have been routed elsewhere or is missing any information).

Un-pivoting data is supported on numerous SQL engines & in Pandas (with the
'melt' function), but it isn't directly available in spark. It's easy
enough to derive this functionality using the 'stack' function or a
combination of struct, array, and explode (e.g. such as the reproduction of
the melt function in pandas-on-pyspark here
<https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651>),
but I was wondering whether a more native solution had been considered? It
would make end-user code more lightweight at the very least; and I wonder
whether it could be made more efficient than using the stack
function/struct-array-explode method.

I'm happy to try and make a PR if this is something that might be useful
within spark. No worries if this is not something that you think should be
supported; the methods above work and are well documented on StackOverflow.
I was personally just caught out by this, and thought it would be useful to
raise.

I did see a thread in the Pony archive about this issue, but it looks like
it didn't go anywhere. Does anyone else have context on this
<https://lists.apache.org/[email protected]:lte=60M:unpivot>?

Kind Regards,

-- 
*Daniel Davies*

[Spark Core]: Support for un-pivoting data ('melt')

Reply via email to