I assume that the buffer that we are giving to thrift doesn't have the header in it at the expected position. We hadn't seen this error in any of our regression tests in Drill with the final version of the patch, but I have debugged a few issues that produced this error in the past, including some that came up when we merged our changes into master.
Can you try to generate data similar to the private dataset the produces the issue? If you are having trouble reproducing could you share the data types and encodings that are being used in the file and I can try to reproduce it. Thanks, Jason On Fri, Dec 4, 2015 at 1:32 PM, Daniel Weeks <[email protected]> wrote: > Jason or Julien, > > Just wanted to see if you or anyone else has run into problems reading > files after the ByteBuffer patch. I've been running into issues and have > narrowed it down to the ByteBuffer commit using a small repro file (written > with 1.6.0, unfortunately can't share the data). > > It doesn't happen for every file, but those that fail give this error: > > can not read class org.apache.parquet.format.PageHeader: Required field > 'uncompressed_page_size' was not found in serialized data! Struct: > PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0) > > I assume that the real problem is somehow being trapped and suppressed by > thrift. > > Has anyone else seen this? > > Thanks, > Dan >
