Hello Paul, Yes, Calcite's Schema interface directly says about nesting in its Javadoc. Particularly in Drill, everything starts from org.apache.drill.exec.store.AbstractSchema, then each storage plugin extends the class to manage sub-schema and table registration. Since hdf5 implemented as format plugin I'm not sure whether the mechanism can be easily applied to it.
Thanks, Igor On Tue, Feb 18, 2020 at 9:48 AM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi Igor, > > Thanks! I should have remembered that bit of SQL. > > Yes, if we can generalize `DESCRIBE`, we could create another path of some > kind through the plugins that say, "return schema, not data." > > Then, for the HDF5 use case we could have: > > DESCRIBE TABLE `dfs`.`myFile.hdf5` -- returns schema > > > And > > SELECT * FROM `dfs`.`myFile.hdf5` -- returns data > > > Nice solution! I'll file a feature request. > > > The next interesting bit about HDF5 is that it is a file system, it > contains multiple data sets. Would be great to be able to express that in > the FROM clause: > > SELECT * FROM `dfs`.`myFile.hdf5`.`dataSet1` > > From my random walks though Calcite, it appears that we can have any level > of schema/table path. True? We'd need some way to resolve a name part to a > file, then ask the format plugin for that file if it supports additional > parts. This seems pretty obscure. Have we done anything like that before? > Maybe in storage (rather than format) plugin? > > Thanks, > - Paul > > > > On Monday, February 17, 2020, 11:34:48 PM PST, Igor Guzenko < > ihor.huzenko....@gmail.com> wrote: > > Hello Paul, > > Seems like we simply need to improve our DESCRIBE [1] table functionality. > > [1] https://drill.apache.org/docs/describe/ > > Thanks, > Igor > > On Tue, Feb 18, 2020 at 9:23 AM Paul Rogers <par0...@yahoo.com.invalid> > wrote: > > > Hi All, > > > > Charles has a little PR, #1978, that adds a convenient feature to his > > HDF5 format reader: the ability to query the schema of the file. (It > seems > > that HDF5 is a bit like a zip file: it contains a set of files. Unlike > zip, > > each file is a data set with a schema.) Charles added a clever way to > tell > > the reader that the user wants a schema rather than data. > > > > If we think a bit, we realize that a schema query would be handy for any > > data source. Maybe I want to know the fields in a JSON or Parquet file > > without getting the data for those fields (and, for example, inferring > type > > and nullability from data.) > > > > In a relational DB, we'd get the schema by querying system tables. We'd > do > > the same thing in Hive because Hive requires an up-front schema. But, > Drill > > is unique in that it can infer schema at run time; no previous schema > > required. So, we have no system tables to answer schema questions. > Instead, > > we want to get the schema directly from the data source itself by doing a > > query. > > > > (This feature would be in addition to the case when the Metastore does > > hold a schema.) > > > > > > How might we accomplish the same result? Can we create some kind of > > "virtual" system table that tells us to rewrite the query to get schema? > > Something like: > > > > SELECT * FROM sys.columns WHERE tableName = `dfs`.`my/path/someFile.json` > > > > Or, maybe some implied columns in the table schema? > > > > > > SELECT schema.* FROM `dfs`.`my/path/someFile.json` > > > > > > Or, maybe some special schema name space? > > > > SELECT schema.* FROM schema.`dfs`.`my/path/someFile.json` > > > > > > Anyone know of any system that has an elegant solution we could mimic? > > Other suggestions? > > > > > > Thanks, > > - Paul > > > > >