cgivre commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail with 
Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-585184636
 
 
   @vvysotskyi Let me give you some context.. 
   This plugin has two ways of interacting with HDF5 files: metadata queries 
and dataset queries.  HDF5 is like a filesystem within a file, so it can 
contain many datasets.  The dataset query looks at a specific dataset and 
projects the columns and rows as you would expect.
   
   Metadata queries are intended to explore the HDF5 itself rather than an 
individual dataset.  As currently implemented, in metadata queries, the plugin 
will return the filename, paths, dataset types, from the HDF5 file.  Here's 
where the problem arose... The metadata query also maps each dataset to a cell 
in each row.  This is useful because the user gets a preview of the data that 
is actually in each dataset, however if that dataset is larger than 16MB, Drill 
crashes.  When I originally implemented this (before EVF) this wasn't an issue 
because the plugin itself handled pushdown projection, and therefore all the 
user had to do was exclude the dataset from the query.  However, with EVF it 
doesn't work that way. 
   
   Therefore options are:
   1.   Remove this preview functionality entirely 
   2.  Select some small amount from each dataset and project that in a 
metadata query 
   3.  Add a config option to not generate the preview columns in metadata 
querires. 
   4.  Convert preview to a string and truncate at size limit. 
   
   Of these options, option 3 felt the easiest and most useful to me as it 
preserved the functionality and gave the users a way to make it work. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to