Thanks for the timely fix jason. I will try it out sometime over the
weekend.

- Rahul

On Fri, Nov 6, 2015 at 7:02 PM, Jason Altekruse <[email protected]>
wrote:

> As I said on the JIRA, I fixed the reading issue for the file you posted.
> I'm working on a unit test to catch these things sooner in the future.
>
> On Fri, Nov 6, 2015 at 6:16 PM, Jacques Nadeau <[email protected]> wrote:
>
> > My question was the other way around. If the reader is corrupting things,
> > I'd like to do a ctas from parquet => json and look if the json is
> > corrupted. Jason is taking a look now.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Nov 6, 2015 at 6:08 PM, rahul challapalli <
> > [email protected]> wrote:
> >
> > > I did try your suggestion and sqlline displayed the columns from the
> json
> > > file just fine. Raised the below jira to track this issue
> > > https://issues.apache.org/jira/browse/DRILL-4048
> > >
> > > On Fri, Nov 6, 2015 at 5:52 PM, Jacques Nadeau <[email protected]>
> > wrote:
> > >
> > > > I wouldn't jump to that conclusion. Sqlline uses toString. If we
> > changed
> > > > the toString behavior, it could be a problem. Maybe do a ctas to a
> json
> > > > file to confirm.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Nov 6, 2015 at 5:40 PM, rahul challapalli <
> > > > [email protected]> wrote:
> > > >
> > > > > From a previous build, I got the data for these columns just fine
> > from
> > > > > sqlline. So I think we can eliminate any display issues unless I am
> > > > missing
> > > > > something?
> > > > >
> > > > > - Rahul
> > > > >
> > > > > On Fri, Nov 6, 2015 at 5:34 PM, Jacques Nadeau <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Can you confirm if this is a display bug in sqlline or jdbc to
> > string
> > > > > > versus an actual data return?
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Fri, Nov 6, 2015 at 5:31 PM, rahul challapalli <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Jason,
> > > > > > >
> > > > > > > You were partly correct. We are not dropping records however we
> > are
> > > > > > > corrupting dictionary encoded binary columns. I got confused
> that
> > > we
> > > > > are
> > > > > > > returning different records, but we are trimming (or returning
> > > > > unreadable
> > > > > > > chars) some columns which are binary. I was able to reproduce
> > with
> > > > the
> > > > > > > lineitem data set. I will raise a jira and I think this should
> be
> > > > > treated
> > > > > > > critical. Thoughts?
> > > > > > >
> > > > > > > - Rahul
> > > > > > >
> > > > > > > On Fri, Nov 6, 2015 at 4:30 PM, rahul challapalli <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > > Jason,
> > > > > > > >
> > > > > > > > I missed that. Let me check whether we are dropping any
> > records.
> > > I
> > > > > > would
> > > > > > > > be surprised if our regression tests missed that :)
> > > > > > > >
> > > > > > > > - Rahul
> > > > > > > >
> > > > > > > > On Fri, Nov 6, 2015 at 4:19 PM, Jason Altekruse <
> > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Rahul,
> > > > > > > >>
> > > > > > > >> Thanks for working on a reproduction of the issue. You
> didn't
> > > > > actually
> > > > > > > >> answer my first question, are you getting the same data out
> of
> > > the
> > > > > > file,
> > > > > > > >> just in a different order? It seems much more likely that we
> > are
> > > > > > > dropping
> > > > > > > >> some records at the beginning than reordering them somehow,
> > > > > although I
> > > > > > > >> would have expected an error like this to be caught by the
> > unit
> > > or
> > > > > > > >> regression tests.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Jason
> > > > > > > >>
> > > > > > > >> On Fri, Nov 6, 2015 at 4:13 PM, rahul challapalli <
> > > > > > > >> [email protected]> wrote:
> > > > > > > >>
> > > > > > > >> > Thanks for your replies. The file is private and I will
> try
> > to
> > > > > > > >> construct a
> > > > > > > >> > file without sensitive data which can expose this
> behavior.
> > > > > > > >> >
> > > > > > > >> > - Rahul
> > > > > > > >> >
> > > > > > > >> > On Fri, Nov 6, 2015 at 3:45 PM, Jason Altekruse <
> > > > > > > >> [email protected]>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Is this a large or private parquet file? Can you share
> it
> > to
> > > > > allow
> > > > > > > me
> > > > > > > >> to
> > > > > > > >> > > debug the read path for it?
> > > > > > > >> > >
> > > > > > > >> > > On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse <
> > > > > > > >> > [email protected]>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > The changes to parquet were not supposed to be
> > functional
> > > at
> > > > > > all.
> > > > > > > We
> > > > > > > >> > had
> > > > > > > >> > > > been maintaining our fork of parquet-mr to have a
> > > ByteBuffer
> > > > > > based
> > > > > > > >> read
> > > > > > > >> > > and
> > > > > > > >> > > > write path to reduce heap memory usage. The work done
> > was
> > > > just
> > > > > > > >> getting
> > > > > > > >> > > > these changes merged back into parquet-mr and making
> > > > > > corresponding
> > > > > > > >> > > changes
> > > > > > > >> > > > in Drill to accommodate any interface modifications
> > > > introduced
> > > > > > > >> since we
> > > > > > > >> > > > last rebased (there were mostly just package renames).
> > > There
> > > > > > were
> > > > > > > a
> > > > > > > >> lot
> > > > > > > >> > > of
> > > > > > > >> > > > comments on the PR, and a decent amount of refactoring
> > > that
> > > > > was
> > > > > > > >> done to
> > > > > > > >> > > > consolidate and otherwise clean up the code, but there
> > > > > shouldn't
> > > > > > > >> have
> > > > > > > >> > > been
> > > > > > > >> > > > any changes to the behavior of the reader or writer.
> > > > > > > >> > > >
> > > > > > > >> > > > Are you getting all of the same data out if you read
> the
> > > > whole
> > > > > > > file,
> > > > > > > >> > just
> > > > > > > >> > > > in a different order?
> > > > > > > >> > > >
> > > > > > > >> > > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli <
> > > > > > > >> > > > [email protected]> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > >> parquet-meta command suggests that there is only one
> > row
> > > > > group
> > > > > > > >> > > >>
> > > > > > > >> > > >> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau <
> > > > > > > [email protected]
> > > > > > > >> >
> > > > > > > >> > > >> wrote:
> > > > > > > >> > > >>
> > > > > > > >> > > >> > How many row groups?
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > --
> > > > > > > >> > > >> > Jacques Nadeau
> > > > > > > >> > > >> > CTO and Co-Founder, Dremio
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli <
> > > > > > > >> > > >> > [email protected]> wrote:
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > > Drillers,
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > With the new parquet library update, can someone
> > > throw
> > > > > some
> > > > > > > >> light
> > > > > > > >> > on
> > > > > > > >> > > >> the
> > > > > > > >> > > >> > > order in which the records are read from a single
> > > > parquet
> > > > > > > file?
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > With the older library, when I run the below
> query
> > > on a
> > > > > > > single
> > > > > > > >> > > parquet
> > > > > > > >> > > >> > > file, I used to get a set of records. Now after
> the
> > > > > parquet
> > > > > > > >> > library
> > > > > > > >> > > >> > update,
> > > > > > > >> > > >> > > I am seeing a different set of records. Just
> wanted
> > > to
> > > > > > > >> understand
> > > > > > > >> > > what
> > > > > > > >> > > >> > > specifically has changed.
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > select * from `file.parquet` limit 5;
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > - Rahul
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> >
> > > > > > > >> > > >>
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to