Re: Can't read some parquet files after ByteBuffer Patch

Julien Le Dem Mon, 07 Dec 2015 14:26:22 -0800

In the meantime if you have the stacktrace for this error that would help
too.


On Fri, Dec 4, 2015 at 1:59 PM, Jason Altekruse <[email protected]>
wrote:

> I assume that the buffer that we are giving to thrift doesn't have the
> header in it at the expected position. We hadn't seen this error in any of
> our regression tests in Drill with the final version of the patch, but I
> have debugged a few issues that produced this error in the past, including
> some that came up when we merged our changes into master.
>
> Can you try to generate data similar to the private dataset the produces
> the issue? If you are having trouble reproducing could you share the data
> types and encodings that are being used in the file and I can try to
> reproduce it.
>
> Thanks,
> Jason
>
> On Fri, Dec 4, 2015 at 1:32 PM, Daniel Weeks <[email protected]>
> wrote:
>
> > Jason or Julien,
> >
> > Just wanted to see if you or anyone else has run into problems reading
> > files after the ByteBuffer patch.  I've been running into issues and have
> > narrowed it down to the ByteBuffer commit using a small repro file
> (written
> > with 1.6.0, unfortunately can't share the data).
> >
> > It doesn't happen for every file, but those that fail give this error:
> >
> > can not read class org.apache.parquet.format.PageHeader: Required field
> > 'uncompressed_page_size' was not found in serialized data! Struct:
> > PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
> >
> > I assume that the real problem is somehow being trapped and suppressed by
> > thrift.
> >
> > Has anyone else seen this?
> >
> > Thanks,
> > Dan
> >
>



-- 
Julien

Re: Can't read some parquet files after ByteBuffer Patch

Reply via email to