One thing I want to add is use_new_reader uses reader from parquet-mr library, where as default one is drill’s native reader which is supposed to be better, performance wise. But, it does not support complex types and we automatically switch to use reader from parquet library when we have to read complex types.
Thanks, Padma On May 2, 2017, at 11:09 AM, Jinfeng Ni <j...@apache.org<mailto:j...@apache.org>> wrote: - What the two readers are (is one a special drill thing, is the other a standard reader from the parquet project?) - What is the eventual goal here... to be able to use and switch between both? To provide the option? To have code parity with another project? Both readers were for reading parquet data into Drill's value vector. The default one (when store.parquet.use_new_reader is false) was faster (based on measurements done by people worked on the two readers), but it could not support complex type like map/array. The new reader would be used by Drill either if you change the option to true, or when the parquet data you are querying contain complex type (even with the default option being false). Therefore, both readers might be used by Drill code. There was a Parquet hackathon some time ago, which aimed to make people in different projects using parquet work together to standardize a vectorized reader. I did not keep track of that effort. People with better knowledge of that may share their inputs. - Do either of the readers work with Arrow? For now, neither works with Arrow, since Drill has not integrated with Arrow yet. See DRILL-4455 for the latest discussion (https://issues.apache.org/jira/browse/DRILL-4455). I would expect Drill's parquet reader will work with Arrow, once the integration is done.