I also think most of the proposed benefits from these new formats can be
achieved using the current parquet format and improved implementations.

My concern is that:
1. For encoding, though so many interesting encoding is introduced, most
    implementation now just uses and implements PLAIN and Dictionary.
    We can make full use of current encoding and introduce some new
    encoding allowing skip, compress and read data in some specific
scenario.
2. We can start optimizing for semi-structure and ML data. And we can do
specific
   optimization for these case like[1] Rep-Level and Def-Level is feature
rich, however
   we can also optimize when not necessary to read them. Besides, we can
support
   some type like geo within Parquet

[1] https://github.com/apache/arrow/issues/34510#issuecomment-2109768275

Reply via email to