Hi Alkis,
This is great, I can try to find some time to try to make it work in CPP if
nobody else volunteers.  I think one formality that should probably be done
before we iterate on it  is changing the License on the top of the gist to
the Apache 2.0 license (if I am reading it correctly it appears to be
marked as proprietary currently).


Thanks,
Micah


On Thu, Jun 6, 2024 at 1:22 PM Alkis Evlogimenos
<alkis.evlogime...@databricks.com.invalid> wrote:

> Hey folks.
>
> I have been asked to share the latest flatbuffer prototype.
>
> I will put the latest in this gist
> <https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> left with
> TODOs if folks want to collaborate.
>
> I am iterating in our internal C++ codebase, it would be nice if someone
> more knowledgeable with parquet-cpp can integrate this there so that we can
> do benchmarking/experimentation. Once setup I would be happy to contribute
> the scaffolding that converts from thrift to flatbuffers and take it from
> there.
>
> Other than the TODOs in the file, the following items are still missing:
> - optimize Statistics: this is by far the biggest payload
> - encryption is completely untouched/unthought
> - column indexes
> - bloom filters
>
> Some of the above might have to stay as is.
>
> The biggest blocker for me right now is collecting "interesting" footers
> from real tables (I very much dislike generated ones) and building a good
> repository with them to drive more design decisions.
>

Reply via email to