paul-rogers commented on issue #1978: DRILL-7578: HDF5 Metadata Queries Fail 
with Large Files
URL: https://github.com/apache/drill/pull/1978#issuecomment-585893378
 
 
   @cgivre, sounds like a good plan.
   
   By the way, it occurred to me that the original "preview" idea may have 
another issue. Drill is SQL-based; clients work best when a column has a 
specific type. On the other hand, if HDF5 is a file system, and we want to 
preview files, each file may have a different kind of data: records in one, 
strings in another, a matrix in a third. If we try to write each of these into 
a "preview" column, not only do we have a size issue, we also have a type 
issue: all of these examples are different types.
   
   On your laptop, the OS gives you a preview of each file. The common 
denominator is the graphic tile which might be a tiny version of an image or a 
video frame, might be an app icon, or whatever. Point is, the OS converts all 
the many file types into a common format: the preview tile.
   
   If the "metadata" view scans all files, the preview can be huge (the entire 
HDF5 content, perhaps.) A preview should be small. Rendering a tile may not be 
super helpful. But, perhaps a brief text representation. For your bit column, 
maybe: "[[123456.78, 98745.43, ...".
   
   Your proposed solution of using HDF5-provided metadata is also good, it 
makes your metadata query more like an "ls -l" equivalent. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to