Re: [I] Lance-spark support in Gluten [gluten]

via GitHub Tue, 09 Jun 2026 21:54:23 -0700


JkSelf commented on issue #12263:
URL: https://github.com/apache/gluten/issues/12263#issuecomment-4666660656


   Lance-spark's write path is row-based primarily because Spark's DSv2 sink 
APIs are still bound to InternalRow. Internally, lance-spark converts and 
buffers these rows into Arrow columnar batches before flushing them to Lance.
   
   Therefore, since Gluten already operates on Arrow columnar data, we can 
bypass the standard row-based lance-spark sink path entirely. A more possible 
direction would be:
   
   ```
   Velox columnar
   -> Arrow C Data Interface / ArrowArrayStream
   -> lance-core (via Dataset.write().stream(...).execute())
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Lance-spark support in Gluten [gluten]

Reply via email to