[GitHub] [drill] paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with Large Files

GitBox Wed, 12 Feb 2020 20:07:11 -0800

paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail 
with Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-585540050
 
 
   @cgivre, thanks for the stack traces. One of the annoying aspects of Drill 
errors are the first two chunks of traces: they will always be the same as they 
are on the client side.
   
   The 16MB error is basically saying that we are writing more than 16MB for a 
single column in a single row. This is generally Not a Good Thing.
   
   I guess I'm surprised that you are able to get that error, however. You said 
you are writing `INT` values. A batch can have at most 32K values. 32K * 4 = 
128K. If you have `BIGINT`, it is 256K. The same is true if you write a 
`DOUBLE`.
   
   Looks like your code may be writing an array given the "doubleMatrixHelper" 
name. If so, then you can get to 16 MB if you write a square matrix with more 
than sqrt(16M/8) ~= 1400 elements. Are you?
   
   The problem with such large arrays are several. First, clients can't consume 
them; they have to be flattened. You will get 16M / 8 = 2M rows as a result. 
Second, allocating buffers larger than 16 MB will fragment memory.
   
   We can provide an option to disable the 16MB limit. But, this is using a 
table saw without the guard: it will work some times, will cause mayhem other 
times. Does not help with the other issue: that an xDBC client can't really 
consume that volume of data.
   
   What is the use case here to a) show such large volumes of data in a schema 
view, and b) retrieve that volume of data even in a data query?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with Large Files

Reply via email to