zhztheplayer commented on issue #12263: URL: https://github.com/apache/gluten/issues/12263#issuecomment-4661646045
Lance reader seems relatively easier to integrate, as it reads data into a arrow-compatible format, [ref](https://github.com/lance-format/lance-spark/blob/main/lance-spark-base_2.12/src/main/java/org/lance/spark/internal/LanceFragmentColumnarBatchScanner.java). Lance writer ([ref](https://github.com/lance-format/lance-spark/blob/main/lance-spark-base_2.12/src/main/java/org/lance/spark/write/LanceDataWriter.java)) is a bit more complicated, the writer API is row-based which is constrained by the Spark standard. Very similar to Iceberg. For the time being we can optimize the read-side efficiency in Gluten, by inserting arrow-to-velox columnar transition between lance scan and Gluten to eliminate r2c. @sezruby let me know if you have any comments on this. cc @malinjawi if you'd like to offer some help as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
