I got the file, I should have time to look at it today. On Mon, Dec 7, 2015 at 3:05 PM, Daniel Weeks <[email protected]> wrote:
> I sent Jason a file that can reproduce the issue with just 1K lines in it. > > If you want, I can open a JIRA and attach the file. > > 5a45ae3b1deb5117cb9e9a13141eeab1e9ad3d71 Can read the file without issue > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8 (bytebuffer) cannot read the file > > -Dan > > On Mon, Dec 7, 2015 at 2:19 PM, Julien Le Dem <[email protected]> wrote: > > > In the meantime if you have the stacktrace for this error that would help > > too. > > > > On Fri, Dec 4, 2015 at 1:59 PM, Jason Altekruse < > [email protected]> > > wrote: > > > > > I assume that the buffer that we are giving to thrift doesn't have the > > > header in it at the expected position. We hadn't seen this error in any > > of > > > our regression tests in Drill with the final version of the patch, but > I > > > have debugged a few issues that produced this error in the past, > > including > > > some that came up when we merged our changes into master. > > > > > > Can you try to generate data similar to the private dataset the > produces > > > the issue? If you are having trouble reproducing could you share the > data > > > types and encodings that are being used in the file and I can try to > > > reproduce it. > > > > > > Thanks, > > > Jason > > > > > > On Fri, Dec 4, 2015 at 1:32 PM, Daniel Weeks > <[email protected] > > > > > > wrote: > > > > > > > Jason or Julien, > > > > > > > > Just wanted to see if you or anyone else has run into problems > reading > > > > files after the ByteBuffer patch. I've been running into issues and > > have > > > > narrowed it down to the ByteBuffer commit using a small repro file > > > (written > > > > with 1.6.0, unfortunately can't share the data). > > > > > > > > It doesn't happen for every file, but those that fail give this > error: > > > > > > > > can not read class org.apache.parquet.format.PageHeader: Required > field > > > > 'uncompressed_page_size' was not found in serialized data! Struct: > > > > PageHeader(type:null, uncompressed_page_size:0, > compressed_page_size:0) > > > > > > > > I assume that the real problem is somehow being trapped and > suppressed > > by > > > > thrift. > > > > > > > > Has anyone else seen this? > > > > > > > > Thanks, > > > > Dan > > > > > > > > > > > > > > > -- > > Julien > > >
