[jira] [Updated] (SPARK-32334) Investigate commonizing Columnar and Row data transformations

Erik Krogen (Jira) Fri, 17 Jul 2020 08:31:17 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erik Krogen updated SPARK-32334:
--------------------------------
    Description: 
We introduced more Columnar Support with SPARK-27396.

With that we recognized that there is code that is doing very similar 
transformations from ColumnarBatch or Arrow into InternalRow and vice versa.  
For instance: 
[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58]

[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389]

We should investigate if we can commonize that code.

We are also looking at making the internal caching serialization pluggable to 
allow for different cache implementations. 
([https://github.com/apache/spark/pull/29067]). 

It was recently brought up that we should investigate if using the data source 
v2 api makes sense and is feasible for some of these transformations to allow 
it to be easily extended.

  was:
We introduced more Columnar Support with SPARK-27396.

With that we recognized that there is code that is doing very similar 
transformations from ColumnarBatch or Arrow into InternalRow and vice versa.  
For instance: 
[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58]

[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389]

We should investigate if we can commonize that code.

We are also looking at making the internal caching serialization pluggable to 
allow for different cache implementations. 
([https://github.com/apache/spark/pull/29067).] 

It was recently brought up that we should investigate if using the data source 
v2 api makes sense and is feasible for some of these transformations to allow 
it to be easily extended.


> Investigate commonizing Columnar and Row data transformations 
> --------------------------------------------------------------
>
>                 Key: SPARK-32334
>                 URL: https://issues.apache.org/jira/browse/SPARK-32334
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> We introduced more Columnar Support with SPARK-27396.
> With that we recognized that there is code that is doing very similar 
> transformations from ColumnarBatch or Arrow into InternalRow and vice versa.  
> For instance: 
> [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58]
> [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389]
> We should investigate if we can commonize that code.
> We are also looking at making the internal caching serialization pluggable to 
> allow for different cache implementations. 
> ([https://github.com/apache/spark/pull/29067]). 
> It was recently brought up that we should investigate if using the data 
> source v2 api makes sense and is feasible for some of these transformations 
> to allow it to be easily extended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-32334) Investigate commonizing Columnar and Row data transformations

Reply via email to