Is this a large or private parquet file? Can you share it to allow me to debug the read path for it?
On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse <[email protected]> wrote: > The changes to parquet were not supposed to be functional at all. We had > been maintaining our fork of parquet-mr to have a ByteBuffer based read and > write path to reduce heap memory usage. The work done was just getting > these changes merged back into parquet-mr and making corresponding changes > in Drill to accommodate any interface modifications introduced since we > last rebased (there were mostly just package renames). There were a lot of > comments on the PR, and a decent amount of refactoring that was done to > consolidate and otherwise clean up the code, but there shouldn't have been > any changes to the behavior of the reader or writer. > > Are you getting all of the same data out if you read the whole file, just > in a different order? > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli < > [email protected]> wrote: > >> parquet-meta command suggests that there is only one row group >> >> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau <[email protected]> >> wrote: >> >> > How many row groups? >> > >> > -- >> > Jacques Nadeau >> > CTO and Co-Founder, Dremio >> > >> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli < >> > [email protected]> wrote: >> > >> > > Drillers, >> > > >> > > With the new parquet library update, can someone throw some light on >> the >> > > order in which the records are read from a single parquet file? >> > > >> > > With the older library, when I run the below query on a single parquet >> > > file, I used to get a set of records. Now after the parquet library >> > update, >> > > I am seeing a different set of records. Just wanted to understand what >> > > specifically has changed. >> > > >> > > select * from `file.parquet` limit 5; >> > > >> > > - Rahul >> > > >> > >> > >
