sunchao commented on PR #1:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/1#issuecomment-1911097541

   Thanks @alamb , really appreciated
   
   > I wonder if you have a public roadmap about where you hope to take this 
project?
   
   We don't have it yet. Internally we do have roadmap under `doc` but it was 
removed in this PR. We can add it back after the initial PR.
   
   > As I understand it the next step is to perform the IP clearance process ...
   
   That's great! I'll check how it was done for other projects, and let you 
know if I need any help with it.
   
   > There appears to be another implementation of parquet in java as well as 
in rust.
   
   Yes, the Comet Parquet reader is a hybrid implementation: the IO part is 
done in Java while the decoding (to Arrow) & decompression is done in native. 
This is based on the assumption that we won't get much performance gain by 
moving the IO part to native. While keeping it in Java, we are able to leverage 
various storage connectors such as S3 and HDFS, that are already pretty mature, 
as well as Parquet features that are missing on the native side, like 
[encryption support](https://github.com/apache/arrow-rs/issues/3511).
   
   With that said, at some point we do want to switch to a fully native Parquet 
reader like the one in DF. This can potentially help to simplify a lot of the 
logic we currently have.
   
   > There is a set of kernels (e.g. core/src/execution/kernels/strings.rs that 
seems somewhat similar to what is in arrow-rs and datafusion)
   
   Yes, I think we should be able to switch to the ones in DF now. These were 
added long time back when some of the string kernels in DF still didn't support 
dictionary, which is no longer true.
   
   > The 
[docs](https://github.com/apache/arrow-datafusion-comet/blob/comet-upstream/README.md)
 imply there is codgen for filters, but I didn't find any reference to that in 
the code
   
   This is something we want to do in Comet, but hasn't started yet :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to