Taking a guess here, but it could be something to do with Drill reading Parquet 
data in record batches of 4K records, which might point to a bug.

A couple of questions.
1. What are your JVM memory settings? For LIMIT 4096, the Drill profile (via 
WebUI) would show you the memory usage. Does the math indicate that the memory 
usage is high? Try bumping up the memory and retry.
2. Does the same crash occur when the Limit is like 6K or 8K ?
3. I'm presuming the data is preserving the record-by-record characteristics 
like the size of the record. If not, can your SDK application regenerate the 
Parquet in shuffled order?

If there is a crash, there is evidence. Typically, there will be a stack trace 
in the logs and even an HProf file (typically named): drillbit_<pid>.hprof 

My gut feeling is that although you've specified 4K limit, the JSON records are 
probably large, so memory consumption is high (which is why #1 needs to be 
answered first).



-----Original Message-----
From: Matthew Mucker [mailto:[email protected]] 
Sent: Wednesday, October 04, 2017 6:54 AM
To: [email protected]
Subject: Re: Newbie: Help debugging Drill

Charles,

 

I'm querying a Parquet file that I created by running a bunch of .json through 
Kite SDK. The data describes playback of video assets by mobile devices. The 
exact query that's causing the exception is:

 

select internalsessionid,
flatten(playbacksegments['array'])['playbackstarttimestamp'] b from MyTable 
limit 4097;

 

Where internalsessionid is a Guid represented as a string, and 
playbacksegments['array'] is an array of complex objects. When my limit is 
4096, the query returns successfully. When the limit is 4097 I get the crash. 

 

My data file contains company proprietary information, so I'm not able/willing 
to post it publicly (although I could probably share with the Drill devs). I 
attempted to create a dummy file with which to repro the problem last night, 
but the problem doesn't manifest with my dummy data.
(argh.) 

 

-M

 

 

HI Matthew, 

Can you describe the data you are querying and the query you are trying to 
execute?

- C

 

 

> On Oct 2, 2017, at 17:19, Matthew Mucker <[email protected]> wrote:

> 

> I became a new Drill user last week only to discover that Drill would
crash

> with an IndexOutOfBounds exception on one of my queries. Some 
> searching
and

> testing later, my best guess is that I'm hitting DRILL-5451.

> 

> 

 

Reply via email to