[ 
https://issues.apache.org/jira/browse/DRILL-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038855#comment-17038855
 ] 

ASF GitHub Bot commented on DRILL-7578:
---------------------------------------

paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail 
with Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-587322871
 
 
   @cgivre, one more design-level comment about this particular file format. 
You've mentioned several times that HDF5 is "a file system within a file." It 
finally clicked: we need need to treat this file as a directory, not a file. 
This means adding a layer of schema in Calcite planning:
   
   ```
   SELECT * FROM `dfs`.`some/path/myFile.hdf5`.`dataSet1`
   ```
   
   This would let the reader load only data from `dataSet1`, using only the 
schema from that data set.
   
   (Can't use slashes; that is a notation for the Hadoop file system.)
   
   Fortunately, Calcite seems to allow any number of schema levels. It is why 
we can have plugins, workspaces, etc. The challenge is to provide some way for 
a format plugin to influence the planner and say, "hey, if you do a query 
against me, ask me to resolve all path elements below my file name."
   
   Again, not something for this PR. But, it is something we can think about as 
we try to improve our storage plugin API.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> HDF5 Metadata Queries Fail with Large Files
> -------------------------------------------
>
>                 Key: DRILL-7578
>                 URL: https://issues.apache.org/jira/browse/DRILL-7578
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.18.0
>
>
> With large files, Drill runs out of memory when attempting to project large 
> datasets in the metadata.  
> This PR adds a configuration option which removes the dataset projection from 
> metadata queries and fixes this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to