cgivre commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with Large Files URL: https://github.com/apache/drill/pull/1978#issuecomment-585347032 @paul-rogers Here's what's happening: ``` apache drill> select * 2..semicolon> from dfs.test.`eFitOut.h5`; Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. A single column value is larger than the maximum allowed size of 16 MB Fragment 0:0 [Error Id: 1d3f37ca-e7e3-48f5-9c7d-cb24bd494a67 on localhost:31010] (state=,code=0) ``` When I dug into this, I found that it was one of the dataset columns that has a single-cell value greater than 16GB. This PR basically disables the reader from attempting to retrieve the datasets and then we avoid the whole issue. What made me do this also is even if you only select other columns, you still get the error: ``` apache drill> select path, data_type 2..semicolon> from dfs.test.`eFitOut.h5`; Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. A single column value is larger than the maximum allowed size of 16 MB Fragment 0:0 [Error Id: 1af14cb0-9bce-488a-9d2d-aca5736670e3 on localhost:31010] (state=,code=0) apache drill> ``` So to conclude: 1. There may be a bug in the EVF projection with large fields. (I don't know...) 2. This PR fixes the issue for HDF5 by removing the datasets from the metadata view. I should note that when the datasets are not projected in the metadata view, the queries execute without issues.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services