Jason, I missed that. Let me check whether we are dropping any records. I would be surprised if our regression tests missed that :)
- Rahul On Fri, Nov 6, 2015 at 4:19 PM, Jason Altekruse <[email protected]> wrote: > Rahul, > > Thanks for working on a reproduction of the issue. You didn't actually > answer my first question, are you getting the same data out of the file, > just in a different order? It seems much more likely that we are dropping > some records at the beginning than reordering them somehow, although I > would have expected an error like this to be caught by the unit or > regression tests. > > Thanks, > Jason > > On Fri, Nov 6, 2015 at 4:13 PM, rahul challapalli < > [email protected]> wrote: > > > Thanks for your replies. The file is private and I will try to construct > a > > file without sensitive data which can expose this behavior. > > > > - Rahul > > > > On Fri, Nov 6, 2015 at 3:45 PM, Jason Altekruse < > [email protected]> > > wrote: > > > > > Is this a large or private parquet file? Can you share it to allow me > to > > > debug the read path for it? > > > > > > On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse < > > [email protected]> > > > wrote: > > > > > > > The changes to parquet were not supposed to be functional at all. We > > had > > > > been maintaining our fork of parquet-mr to have a ByteBuffer based > read > > > and > > > > write path to reduce heap memory usage. The work done was just > getting > > > > these changes merged back into parquet-mr and making corresponding > > > changes > > > > in Drill to accommodate any interface modifications introduced since > we > > > > last rebased (there were mostly just package renames). There were a > lot > > > of > > > > comments on the PR, and a decent amount of refactoring that was done > to > > > > consolidate and otherwise clean up the code, but there shouldn't have > > > been > > > > any changes to the behavior of the reader or writer. > > > > > > > > Are you getting all of the same data out if you read the whole file, > > just > > > > in a different order? > > > > > > > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli < > > > > [email protected]> wrote: > > > > > > > >> parquet-meta command suggests that there is only one row group > > > >> > > > >> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau <[email protected]> > > > >> wrote: > > > >> > > > >> > How many row groups? > > > >> > > > > >> > -- > > > >> > Jacques Nadeau > > > >> > CTO and Co-Founder, Dremio > > > >> > > > > >> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli < > > > >> > [email protected]> wrote: > > > >> > > > > >> > > Drillers, > > > >> > > > > > >> > > With the new parquet library update, can someone throw some > light > > on > > > >> the > > > >> > > order in which the records are read from a single parquet file? > > > >> > > > > > >> > > With the older library, when I run the below query on a single > > > parquet > > > >> > > file, I used to get a set of records. Now after the parquet > > library > > > >> > update, > > > >> > > I am seeing a different set of records. Just wanted to > understand > > > what > > > >> > > specifically has changed. > > > >> > > > > > >> > > select * from `file.parquet` limit 5; > > > >> > > > > > >> > > - Rahul > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
