[
https://issues.apache.org/jira/browse/SPARK-25225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592397#comment-16592397
]
Takeshi Yamamuro commented on SPARK-25225:
------------------------------------------
I don't understand exactly your secinario though, `UserDefinedType` is not
enough for u?
> Add support for "List"-Type columns
> -----------------------------------
>
> Key: SPARK-25225
> URL: https://issues.apache.org/jira/browse/SPARK-25225
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, Spark Core
> Affects Versions: 2.3.1
> Reporter: Yuriy Davygora
> Priority: Minor
>
> At the moment, Spark Dataframe ArrayType-columns only support all elements of
> the array being of same data type.
> At our company, we are currently rewriting old MapReduce code with Spark. One
> of the frequent use-cases is aggregating data into timeseries:
> Example input:
> {noformat}
> ID date data
> 1 2017-01-01 data_1_1
> 1 2018-02-02 data_1_2
> 2 2017-03-03 data_2_1
> 3 2018-04-04 data 2_2
> ...
> {noformat}
> Expected outpus:
> {noformat}
> ID timeseries
> 1 [[2017-01-01, data_1_1],[2018-02-02, data1_2]]
> 2 [[2017-03-03, data_2_1],[2018-04-04, data2_2]]
> ...
> {noformat}
> Here, the values in the data column of the input are, in most cases, not
> primitive, but, for example, lists, dicts, nested lists, etc. Spark, however,
> does not support creating an array column of a string column and a non-string
> column.
> We would like to kindly ask you to implement one of the following:
> 1. Extend ArrayType to support elements of different data type
> 2. Introduce a new container type (ListType?) which would support elements of
> different type
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]