Thanks for the timely fix jason. I will try it out sometime over the weekend.
- Rahul On Fri, Nov 6, 2015 at 7:02 PM, Jason Altekruse <[email protected]> wrote: > As I said on the JIRA, I fixed the reading issue for the file you posted. > I'm working on a unit test to catch these things sooner in the future. > > On Fri, Nov 6, 2015 at 6:16 PM, Jacques Nadeau <[email protected]> wrote: > > > My question was the other way around. If the reader is corrupting things, > > I'd like to do a ctas from parquet => json and look if the json is > > corrupted. Jason is taking a look now. > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Fri, Nov 6, 2015 at 6:08 PM, rahul challapalli < > > [email protected]> wrote: > > > > > I did try your suggestion and sqlline displayed the columns from the > json > > > file just fine. Raised the below jira to track this issue > > > https://issues.apache.org/jira/browse/DRILL-4048 > > > > > > On Fri, Nov 6, 2015 at 5:52 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > > > I wouldn't jump to that conclusion. Sqlline uses toString. If we > > changed > > > > the toString behavior, it could be a problem. Maybe do a ctas to a > json > > > > file to confirm. > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Fri, Nov 6, 2015 at 5:40 PM, rahul challapalli < > > > > [email protected]> wrote: > > > > > > > > > From a previous build, I got the data for these columns just fine > > from > > > > > sqlline. So I think we can eliminate any display issues unless I am > > > > missing > > > > > something? > > > > > > > > > > - Rahul > > > > > > > > > > On Fri, Nov 6, 2015 at 5:34 PM, Jacques Nadeau <[email protected] > > > > > > wrote: > > > > > > > > > > > Can you confirm if this is a display bug in sqlline or jdbc to > > string > > > > > > versus an actual data return? > > > > > > > > > > > > -- > > > > > > Jacques Nadeau > > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > > > On Fri, Nov 6, 2015 at 5:31 PM, rahul challapalli < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Jason, > > > > > > > > > > > > > > You were partly correct. We are not dropping records however we > > are > > > > > > > corrupting dictionary encoded binary columns. I got confused > that > > > we > > > > > are > > > > > > > returning different records, but we are trimming (or returning > > > > > unreadable > > > > > > > chars) some columns which are binary. I was able to reproduce > > with > > > > the > > > > > > > lineitem data set. I will raise a jira and I think this should > be > > > > > treated > > > > > > > critical. Thoughts? > > > > > > > > > > > > > > - Rahul > > > > > > > > > > > > > > On Fri, Nov 6, 2015 at 4:30 PM, rahul challapalli < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Jason, > > > > > > > > > > > > > > > > I missed that. Let me check whether we are dropping any > > records. > > > I > > > > > > would > > > > > > > > be surprised if our regression tests missed that :) > > > > > > > > > > > > > > > > - Rahul > > > > > > > > > > > > > > > > On Fri, Nov 6, 2015 at 4:19 PM, Jason Altekruse < > > > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Rahul, > > > > > > > >> > > > > > > > >> Thanks for working on a reproduction of the issue. You > didn't > > > > > actually > > > > > > > >> answer my first question, are you getting the same data out > of > > > the > > > > > > file, > > > > > > > >> just in a different order? It seems much more likely that we > > are > > > > > > > dropping > > > > > > > >> some records at the beginning than reordering them somehow, > > > > > although I > > > > > > > >> would have expected an error like this to be caught by the > > unit > > > or > > > > > > > >> regression tests. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Jason > > > > > > > >> > > > > > > > >> On Fri, Nov 6, 2015 at 4:13 PM, rahul challapalli < > > > > > > > >> [email protected]> wrote: > > > > > > > >> > > > > > > > >> > Thanks for your replies. The file is private and I will > try > > to > > > > > > > >> construct a > > > > > > > >> > file without sensitive data which can expose this > behavior. > > > > > > > >> > > > > > > > > >> > - Rahul > > > > > > > >> > > > > > > > > >> > On Fri, Nov 6, 2015 at 3:45 PM, Jason Altekruse < > > > > > > > >> [email protected]> > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > >> > > Is this a large or private parquet file? Can you share > it > > to > > > > > allow > > > > > > > me > > > > > > > >> to > > > > > > > >> > > debug the read path for it? > > > > > > > >> > > > > > > > > > >> > > On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse < > > > > > > > >> > [email protected]> > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > >> > > > The changes to parquet were not supposed to be > > functional > > > at > > > > > > all. > > > > > > > We > > > > > > > >> > had > > > > > > > >> > > > been maintaining our fork of parquet-mr to have a > > > ByteBuffer > > > > > > based > > > > > > > >> read > > > > > > > >> > > and > > > > > > > >> > > > write path to reduce heap memory usage. The work done > > was > > > > just > > > > > > > >> getting > > > > > > > >> > > > these changes merged back into parquet-mr and making > > > > > > corresponding > > > > > > > >> > > changes > > > > > > > >> > > > in Drill to accommodate any interface modifications > > > > introduced > > > > > > > >> since we > > > > > > > >> > > > last rebased (there were mostly just package renames). > > > There > > > > > > were > > > > > > > a > > > > > > > >> lot > > > > > > > >> > > of > > > > > > > >> > > > comments on the PR, and a decent amount of refactoring > > > that > > > > > was > > > > > > > >> done to > > > > > > > >> > > > consolidate and otherwise clean up the code, but there > > > > > shouldn't > > > > > > > >> have > > > > > > > >> > > been > > > > > > > >> > > > any changes to the behavior of the reader or writer. > > > > > > > >> > > > > > > > > > > >> > > > Are you getting all of the same data out if you read > the > > > > whole > > > > > > > file, > > > > > > > >> > just > > > > > > > >> > > > in a different order? > > > > > > > >> > > > > > > > > > > >> > > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli < > > > > > > > >> > > > [email protected]> wrote: > > > > > > > >> > > > > > > > > > > >> > > >> parquet-meta command suggests that there is only one > > row > > > > > group > > > > > > > >> > > >> > > > > > > > >> > > >> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau < > > > > > > > [email protected] > > > > > > > >> > > > > > > > > >> > > >> wrote: > > > > > > > >> > > >> > > > > > > > >> > > >> > How many row groups? > > > > > > > >> > > >> > > > > > > > > >> > > >> > -- > > > > > > > >> > > >> > Jacques Nadeau > > > > > > > >> > > >> > CTO and Co-Founder, Dremio > > > > > > > >> > > >> > > > > > > > > >> > > >> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli < > > > > > > > >> > > >> > [email protected]> wrote: > > > > > > > >> > > >> > > > > > > > > >> > > >> > > Drillers, > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > With the new parquet library update, can someone > > > throw > > > > > some > > > > > > > >> light > > > > > > > >> > on > > > > > > > >> > > >> the > > > > > > > >> > > >> > > order in which the records are read from a single > > > > parquet > > > > > > > file? > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > With the older library, when I run the below > query > > > on a > > > > > > > single > > > > > > > >> > > parquet > > > > > > > >> > > >> > > file, I used to get a set of records. Now after > the > > > > > parquet > > > > > > > >> > library > > > > > > > >> > > >> > update, > > > > > > > >> > > >> > > I am seeing a different set of records. Just > wanted > > > to > > > > > > > >> understand > > > > > > > >> > > what > > > > > > > >> > > >> > > specifically has changed. > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > select * from `file.parquet` limit 5; > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > - Rahul > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
