Sounds a lot of great work!

>> Performance benchmarks for Write
I know writing performance is critical for operations like storing a table 
using PARQUET format in some frameworks (like Impala), in addition to this, is 
there any cases to speed up running into the parquet write in query executing?

Sorry if the question is stupid. Thanks in advance.

Regards,
Kai

-----Original Message-----
From: Uwe Korn [mailto:[email protected]] 
Sent: Thursday, April 21, 2016 8:32 PM
To: [email protected]
Subject: Re: Parquet sync up

Hello,

due to me being in Europe, this is a very inconvenient time. Thus I rather 
write a longer mail instead of joining. As a bit of input, here is what I'm up 
to at the moment:

  * Write support in a basic form for parquet-cpp (no compression, fixed 
encodings, excessive memory usage, ..) is nearly done. I hope to open the final 
PR for discussion next week.
  * Remaining Tasks until I make the PR:
    * a bit of code cleanup
    * Going through the API again to make it consistent
    * Metadata for RowGroups and ColumnChunks

Afterwards I would look into one of the following tasks w.r.t. parquet-cpp:
  * WriterProperties to specify compression, encoding, .. on a global and 
per-column basis.
  * Performance benchmarks for Write
  * Integration of Parquet support in Apache Arrow to use it with Python
  * Reduce the memory usage of the initial Writer implementation (therefore we 
probably need to extend the encoders a bit)

If anyone else also looks into this, I'm happy to collaborate ;)

Cheers
Uwe

On 21.04.16 00:51, Julien Le Dem wrote:
> It is happening at 4pm PT on google hangout 
> https://plus.google.com/hangouts/_/event/parquet_sync_up
>
>

Reply via email to