JkSelf commented on issue #12263: URL: https://github.com/apache/gluten/issues/12263#issuecomment-4666660656
Lance-spark's write path is row-based primarily because Spark's DSv2 sink APIs are still bound to InternalRow. Internally, lance-spark converts and buffers these rows into Arrow columnar batches before flushing them to Lance. Therefore, since Gluten already operates on Arrow columnar data, we can bypass the standard row-based lance-spark sink path entirely. A more possible direction would be: ``` Velox columnar -> Arrow C Data Interface / ArrowArrayStream -> lance-core (via Dataset.write().stream(...).execute()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
