paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail 
with Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-587322871
 
 
   @cgivre, one more design-level comment about this particular file format. 
You've mentioned several times that HDF5 is "a file system within a file." It 
finally clicked: we need need to treat this file as a directory, not a file. 
This means adding a layer of schema in Calcite planning:
   
   ```
   SELECT * FROM `dfs`.`some/path/myFile.hdf5`.`dataSet1`
   ```
   
   This would let the reader load only data from `dataSet1`, using only the 
schema from that data set.
   
   (Can't use slashes; that is a notation for the Hadoop file system.)
   
   Fortunately, Calcite seems to allow any number of schema levels. It is why 
we can have plugins, workspaces, etc. The challenge is to provide some way for 
a format plugin to influence the planner and say, "hey, if you do a query 
against me, ask me to resolve all path elements below my file name."
   
   Again, not something for this PR. But, it is something we can think about as 
we try to improve our storage plugin API.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to