Hi Alkis, This is great, I can try to find some time to try to make it work in CPP if nobody else volunteers. I think one formality that should probably be done before we iterate on it is changing the License on the top of the gist to the Apache 2.0 license (if I am reading it correctly it appears to be marked as proprietary currently).
Thanks, Micah On Thu, Jun 6, 2024 at 1:22 PM Alkis Evlogimenos <alkis.evlogime...@databricks.com.invalid> wrote: > Hey folks. > > I have been asked to share the latest flatbuffer prototype. > > I will put the latest in this gist > <https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> left with > TODOs if folks want to collaborate. > > I am iterating in our internal C++ codebase, it would be nice if someone > more knowledgeable with parquet-cpp can integrate this there so that we can > do benchmarking/experimentation. Once setup I would be happy to contribute > the scaffolding that converts from thrift to flatbuffers and take it from > there. > > Other than the TODOs in the file, the following items are still missing: > - optimize Statistics: this is by far the biggest payload > - encryption is completely untouched/unthought > - column indexes > - bloom filters > > Some of the above might have to stay as is. > > The biggest blocker for me right now is collecting "interesting" footers > from real tables (I very much dislike generated ones) and building a good > repository with them to drive more design decisions. >