Sounds a lot of great work! >> Performance benchmarks for Write I know writing performance is critical for operations like storing a table using PARQUET format in some frameworks (like Impala), in addition to this, is there any cases to speed up running into the parquet write in query executing?
Sorry if the question is stupid. Thanks in advance. Regards, Kai -----Original Message----- From: Uwe Korn [mailto:[email protected]] Sent: Thursday, April 21, 2016 8:32 PM To: [email protected] Subject: Re: Parquet sync up Hello, due to me being in Europe, this is a very inconvenient time. Thus I rather write a longer mail instead of joining. As a bit of input, here is what I'm up to at the moment: * Write support in a basic form for parquet-cpp (no compression, fixed encodings, excessive memory usage, ..) is nearly done. I hope to open the final PR for discussion next week. * Remaining Tasks until I make the PR: * a bit of code cleanup * Going through the API again to make it consistent * Metadata for RowGroups and ColumnChunks Afterwards I would look into one of the following tasks w.r.t. parquet-cpp: * WriterProperties to specify compression, encoding, .. on a global and per-column basis. * Performance benchmarks for Write * Integration of Parquet support in Apache Arrow to use it with Python * Reduce the memory usage of the initial Writer implementation (therefore we probably need to extend the encoders a bit) If anyone else also looks into this, I'm happy to collaborate ;) Cheers Uwe On 21.04.16 00:51, Julien Le Dem wrote: > It is happening at 4pm PT on google hangout > https://plus.google.com/hangouts/_/event/parquet_sync_up > >
