[
https://issues.apache.org/jira/browse/DRILL-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035574#comment-17035574
]
ASF GitHub Bot commented on DRILL-7578:
---------------------------------------
cgivre commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with
Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-585347032
@paul-rogers
Here's what's happening:
```
apache drill> select *
2..semicolon> from dfs.test.`eFitOut.h5`;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing
the query.
A single column value is larger than the maximum allowed size of 16 MB
Fragment 0:0
[Error Id: 1d3f37ca-e7e3-48f5-9c7d-cb24bd494a67 on localhost:31010]
(state=,code=0)
```
When I dug into this, I found that it was one of the dataset columns that
has a single-cell value greater than 16GB. This PR basically disables the
reader from attempting to retrieve the datasets and then we avoid the whole
issue.
What made me do this also is even if you only select other columns, you
still get the error:
```
apache drill> select path, data_type
2..semicolon> from dfs.test.`eFitOut.h5`;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing
the query.
A single column value is larger than the maximum allowed size of 16 MB
Fragment 0:0
[Error Id: 1af14cb0-9bce-488a-9d2d-aca5736670e3 on localhost:31010]
(state=,code=0)
apache drill>
```
So to conclude:
1. There may be a bug in the EVF projection with large fields. (I don't
know...)
2. This PR fixes the issue for HDF5 by removing the datasets from the
metadata view.
I should note that when the datasets are not projected in the metadata view,
the queries execute without issues.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> HDF5 Metadata Queries Fail with Large Files
> -------------------------------------------
>
> Key: DRILL-7578
> URL: https://issues.apache.org/jira/browse/DRILL-7578
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.18.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.18.0
>
>
> With large files, Drill runs out of memory when attempting to project large
> datasets in the metadata.
> This PR adds a configuration option which removes the dataset projection from
> metadata queries and fixes this issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)