Chendi.Xue created SPARK-37124: ---------------------------------- Summary: Support Writable ArrowColumnarVector Key: SPARK-37124 URL: https://issues.apache.org/jira/browse/SPARK-37124 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Chendi.Xue
This Jira is aim to add Arrow format as an alternative for ColumnVector solution. Current ArrowColumnVector is not fully equivalent to OnHeap/OffHeapColumnVector in spark, and since Arrow API is now being more stable, and using pandas udf will perform much better than python udf. I amĀ proposing to fully support arrow format as an alternative to ColumnVector just like the other two. What I did in this PR is to create a new class in the same package with OnHeap/OffHeapColumnVector and extend from WritableColumnVector to support all put APIs. UTs are covering all Data Format with testing on writing to columnVector and reading from columnVector. I also added 3 UTs for testing on loading from ArrowRecordBatch and allocateColumns . -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org