JkSelf commented on issue #12263:
URL: https://github.com/apache/gluten/issues/12263#issuecomment-4666660656

   Lance-spark's write path is row-based primarily because Spark's DSv2 sink 
APIs are still bound to InternalRow. Internally, lance-spark converts and 
buffers these rows into Arrow columnar batches before flushing them to Lance.
   
   Therefore, since Gluten already operates on Arrow columnar data, we can 
bypass the standard row-based lance-spark sink path entirely. A more possible 
direction would be:
   
   ```
   Velox columnar
   -> Arrow C Data Interface / ArrowArrayStream
   -> lance-core (via Dataset.write().stream(...).execute())
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to