Hi Gang, Thanks a lot for getting back to me!
So the use case I am having is relatively simple: I was playing around with some data and I wanted to benchmark different compression algorithms in an effort to speed up data retrieval in a simple Parquet based database that I am playing around with. Whilst doing so, I've noticed a very large variance in the performance of the same compression algorithm over different row groups in my Parquet files. Therefore, I was thinking that the best compression configuration for my data would be to use a different algorithm for every column, for every row group in my files. In a real-world situation, I can see this being used by a database, either when new entries are inserted into it, or even as a background 'optimizer' job that runs over existing data. How do you feel about this? Thank you, Andrei On Thu, 21 Mar 2024 at 02:11, Gang Wu <ust...@gmail.com> wrote: > Hi Andrei, > > What is your use case? IMHO, exposing this kind of configuration > will force users to know how will the writer split row groups, which > does not look simple to me. > > Best, > Gang > > On Thu, Mar 21, 2024 at 2:25 AM Andrei Lazăr <lazarandrei...@gmail.com> > wrote: > > > Hi all, > > > > I would like proposing adding support for writing a Parquet file with > > different compression algorithms for every row group. > > > > In my understanding, the Parquet format allows this, however it seems to > me > > that there is no way to achieve this from the C++ implementation. > > > > Does anyone have any thoughts on this? > > > > Thank you, > > Andrei > > >