paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with Large Files URL: https://github.com/apache/drill/pull/1978#issuecomment-585540050 @cgivre, thanks for the stack traces. One of the annoying aspects of Drill errors are the first two chunks of traces: they will always be the same as they are on the client side. The 16MB error is basically saying that we are writing more than 16MB for a single column in a single row. This is generally Not a Good Thing. I guess I'm surprised that you are able to get that error, however. You said you are writing `INT` values. A batch can have at most 32K values. 32K * 4 = 128K. If you have `BIGINT`, it is 256K. The same is true if you write a `DOUBLE`. Looks like your code may be writing an array given the "doubleMatrixHelper" name. If so, then you can get to 16 MB if you write a square matrix with more than sqrt(16M/8) ~= 1400 elements. Are you? The problem with such large arrays are several. First, clients can't consume them; they have to be flattened. You will get 16M / 8 = 2M rows as a result. Second, allocating buffers larger than 16 MB will fragment memory. We can provide an option to disable the 16MB limit. But, this is using a table saw without the guard: it will work some times, will cause mayhem other times. Does not help with the other issue: that an xDBC client can't really consume that volume of data. What is the use case here to a) show such large volumes of data in a schema view, and b) retrieve that volume of data even in a data query?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
