Re: Can't read some parquet files after ByteBuffer Patch

Jason Altekruse Fri, 04 Dec 2015 13:59:54 -0800

I assume that the buffer that we are giving to thrift doesn't have the
header in it at the expected position. We hadn't seen this error in any of
our regression tests in Drill with the final version of the patch, but I
have debugged a few issues that produced this error in the past, including
some that came up when we merged our changes into master.


Can you try to generate data similar to the private dataset the produces
the issue? If you are having trouble reproducing could you share the data
types and encodings that are being used in the file and I can try to
reproduce it.

Thanks,
Jason

On Fri, Dec 4, 2015 at 1:32 PM, Daniel Weeks <[email protected]>
wrote:

> Jason or Julien,
>
> Just wanted to see if you or anyone else has run into problems reading
> files after the ByteBuffer patch.  I've been running into issues and have
> narrowed it down to the ByteBuffer commit using a small repro file (written
> with 1.6.0, unfortunately can't share the data).
>
> It doesn't happen for every file, but those that fail give this error:
>
> can not read class org.apache.parquet.format.PageHeader: Required field
> 'uncompressed_page_size' was not found in serialized data! Struct:
> PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
>
> I assume that the real problem is somehow being trapped and suppressed by
> thrift.
>
> Has anyone else seen this?
>
> Thanks,
> Dan
>

Re: Can't read some parquet files after ByteBuffer Patch

Reply via email to