yjshen commented on PR #2593: URL: https://github.com/apache/arrow-rs/pull/2593#issuecomment-1241430450
Great to see this happening! I suggest we move the majority of the code in this PR to the DataFusion repo and only keep the API changes on the arrow sort compute kernel (the visibility changes) in arrow-rs. My suggestion mainly comes from two folds: we could ease the development by iterating on a single repo in DataFusion instead of counting on a separate arrow-rs release, and we could minimize confusion by having two row modules in two repos. After checking the usage of this comparable row format in https://github.com/apache/arrow-datafusion/pull/3386, I think it's still valid for us to have three variants of the row format to serve different purposes. One for storing efficiency, one for updating efficiency, and one for sort efficiency. For example, if we use this comparable format for aggregation buffer, we would need to repeatedly flip bytes back and force for each cell update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
