mbutrovich opened a new issue, #1403: URL: https://github.com/apache/datafusion-comet/issues/1403
### What is the problem the feature request solves? With the experimental native scans built on DataFusion's ParquetExec and our update to DataFusion 45, we have the opportunity to start adding support for StringView. I have started scoping out this work and would like to start aggregating findings here. ### Describe the potential solution Project-level: - Bump arrow-java version. We're currently on 16.0.0. I believe the view types were added in 17.0.0. I tested bumping to 18.2.0 and so far it doesn't seem too painful. Java-side: - Add support for decoding `Utf8View` and `BinaryView` to `CometVector`. I prototyped this [here](https://github.com/mbutrovich/datafusion-comet/blob/7c4eede25ba6672befa9beeb6f8c3e95dba7cc75/common/src/main/java/org/apache/comet/vector/CometPlainVector.java#L161) and [here](https://github.com/mbutrovich/datafusion-comet/blob/7c4eede25ba6672befa9beeb6f8c3e95dba7cc75/common/src/main/java/org/apache/comet/vector/CometPlainVector.java#L201) for Utf8View and BinaryView, respectively. Native-side: - Enable StringViewArray by default in query execution and Parquet reader. [We're a recent enough DataFusion version that this is done already](https://github.com/apache/datafusion/pull/13101). - planner.rs and serde.rs should generate Utf8View and BinaryView types when possible. - Shuffle: - Add support to hash_util. - Add support to shuffle_writer (slot_size, etc.) I'm sure there's more than this, and will continue adding as I find stuff broken in my proof-of-concept branch. ### Additional context Related DataFusion blogs: https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/ https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org