mbutrovich opened a new issue, #1403:
URL: https://github.com/apache/datafusion-comet/issues/1403

   ### What is the problem the feature request solves?
   
   With the experimental native scans built on DataFusion's ParquetExec and our 
update to DataFusion 45, we have the opportunity to start adding support for 
StringView. I have started scoping out this work and would like to start 
aggregating findings here. 
   
   ### Describe the potential solution
   
   Project-level:
   - Bump arrow-java version. We're currently on 16.0.0. I believe the view 
types were added in 17.0.0. I tested bumping to 18.2.0 and so far it doesn't 
seem too painful.
   
   Java-side: 
   - Add support for decoding `Utf8View` and `BinaryView` to `CometVector`. I 
prototyped this 
[here](https://github.com/mbutrovich/datafusion-comet/blob/7c4eede25ba6672befa9beeb6f8c3e95dba7cc75/common/src/main/java/org/apache/comet/vector/CometPlainVector.java#L161)
 and 
[here](https://github.com/mbutrovich/datafusion-comet/blob/7c4eede25ba6672befa9beeb6f8c3e95dba7cc75/common/src/main/java/org/apache/comet/vector/CometPlainVector.java#L201)
 for Utf8View and BinaryView, respectively.
   
   Native-side:
   - Enable StringViewArray by default in query execution and Parquet reader. 
[We're a recent enough DataFusion version that this is done 
already](https://github.com/apache/datafusion/pull/13101).
   - planner.rs and serde.rs should generate Utf8View and BinaryView types when 
possible.
   - Shuffle:
     - Add support to hash_util.
     - Add support to shuffle_writer (slot_size, etc.)
   
   I'm sure there's more than this, and will continue adding as I find stuff 
broken in my proof-of-concept branch.
   
   ### Additional context
   
   Related DataFusion blogs:
   
   
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
   
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to