cgivre commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with 
Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-585347032
 
 
   @paul-rogers 
   Here's what's happening:
   ```
   apache drill> select *
   2..semicolon> from dfs.test.`eFitOut.h5`;
   Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
the query.
   
   A single column value is larger than the maximum allowed size of 16 MB
   Fragment 0:0
   
   [Error Id: 1d3f37ca-e7e3-48f5-9c7d-cb24bd494a67 on localhost:31010] 
(state=,code=0)
   ```
   When I dug into this, I found that it was one of the dataset columns that 
has a single-cell value greater than 16GB.  This PR basically disables the 
reader from attempting to retrieve the datasets and then we avoid the whole 
issue.  
   
   What made me do this also is even if you only select other columns, you 
still get the error: 
   ```
   apache drill> select path, data_type
   2..semicolon> from dfs.test.`eFitOut.h5`;
   Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
the query.
   
   A single column value is larger than the maximum allowed size of 16 MB
   Fragment 0:0
   
   [Error Id: 1af14cb0-9bce-488a-9d2d-aca5736670e3 on localhost:31010] 
(state=,code=0)
   apache drill>
   ```
   
   So to conclude:
   1.   There may be a bug in the EVF projection with large fields.  (I don't 
know...)
   2.  This PR fixes the issue for HDF5 by removing the datasets from the 
metadata view.
   
   I should note that when the datasets are not projected in the metadata view, 
the queries execute without issues.
   
    
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to