Thanks everyone for their perspectives.

I think as a concrete next step, I'll try to pull together a Google doc
that covers the topics covered here as I think that might be a more
productive way to further the conversation (I don't want threads to get
split too much).

On Tue, May 14, 2024 at 8:33 AM wish maple <maplewish...@gmail.com> wrote:

> I also think most of the proposed benefits from these new formats can be
> achieved using the current parquet format and improved implementations.
>
> My concern is that:
> 1. For encoding, though so many interesting encoding is introduced, most
>     implementation now just uses and implements PLAIN and Dictionary.
>     We can make full use of current encoding and introduce some new
>     encoding allowing skip, compress and read data in some specific
> scenario.
> 2. We can start optimizing for semi-structure and ML data. And we can do
> specific
>    optimization for these case like[1] Rep-Level and Def-Level is feature
> rich, however
>    we can also optimize when not necessary to read them. Besides, we can
> support
>    some type like geo within Parquet
>
> [1] https://github.com/apache/arrow/issues/34510#issuecomment-2109768275
>

Reply via email to