Hey folks.

I have been asked to share the latest flatbuffer prototype.

I will put the latest in this gist
<https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> left with
TODOs if folks want to collaborate.

I am iterating in our internal C++ codebase, it would be nice if someone
more knowledgeable with parquet-cpp can integrate this there so that we can
do benchmarking/experimentation. Once setup I would be happy to contribute
the scaffolding that converts from thrift to flatbuffers and take it from
there.

Other than the TODOs in the file, the following items are still missing:
- optimize Statistics: this is by far the biggest payload
- encryption is completely untouched/unthought
- column indexes
- bloom filters

Some of the above might have to stay as is.

The biggest blocker for me right now is collecting "interesting" footers
from real tables (I very much dislike generated ones) and building a good
repository with them to drive more design decisions.

Reply via email to