Hi all, I'd like to integrate Parquet with pandas, a popular Python library for in-memory data analysis. My plan is to build an efficient connector based on the parquet-cpp project -- is that the recommended way to do this? Somebody told me that Impala's parquet reader is much more performant but also tightly integrated into Impala and hard to extract (I havent checked the licencing and if that is allowed at all). Is this still correct?
As far as the readme file goes: parquet-cpp only supports reading parquet files but not writing. Do you plan to support writing access in the future via this cpp API? Furthermore, pushing down predicates would be very important for my use-case -- does parquet-cpp allow to do that? I haven't seen anything in the code-base yet. My current plan is to start from this example [1] and write a thin wrapper in Cython to expose some of the column reader functionality. Any thoughts/remarks/concerns highly recommended. Thanks, Peter [1] https://github.com/apache/incubator-parquet-cpp/blob/master/example/compute_stats.cc -- Peter Prettenhofer
