Hello Paul, Seems like we simply need to improve our DESCRIBE [1] table functionality.
[1] https://drill.apache.org/docs/describe/ Thanks, Igor On Tue, Feb 18, 2020 at 9:23 AM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi All, > > Charles has a little PR, #1978, that adds a convenient feature to his > HDF5 format reader: the ability to query the schema of the file. (It seems > that HDF5 is a bit like a zip file: it contains a set of files. Unlike zip, > each file is a data set with a schema.) Charles added a clever way to tell > the reader that the user wants a schema rather than data. > > If we think a bit, we realize that a schema query would be handy for any > data source. Maybe I want to know the fields in a JSON or Parquet file > without getting the data for those fields (and, for example, inferring type > and nullability from data.) > > In a relational DB, we'd get the schema by querying system tables. We'd do > the same thing in Hive because Hive requires an up-front schema. But, Drill > is unique in that it can infer schema at run time; no previous schema > required. So, we have no system tables to answer schema questions. Instead, > we want to get the schema directly from the data source itself by doing a > query. > > (This feature would be in addition to the case when the Metastore does > hold a schema.) > > > How might we accomplish the same result? Can we create some kind of > "virtual" system table that tells us to rewrite the query to get schema? > Something like: > > SELECT * FROM sys.columns WHERE tableName = `dfs`.`my/path/someFile.json` > > Or, maybe some implied columns in the table schema? > > > SELECT schema.* FROM `dfs`.`my/path/someFile.json` > > > Or, maybe some special schema name space? > > SELECT schema.* FROM schema.`dfs`.`my/path/someFile.json` > > > Anyone know of any system that has an elegant solution we could mimic? > Other suggestions? > > > Thanks, > - Paul > >