Hello Paul,

Yes, Calcite's Schema interface directly says about nesting in its Javadoc.
Particularly in Drill, everything starts
from org.apache.drill.exec.store.AbstractSchema, then each storage plugin
extends the class to manage sub-schema and table registration.
Since hdf5 implemented as format plugin I'm not sure whether the mechanism
can be easily applied to it.

Thanks,
Igor

On Tue, Feb 18, 2020 at 9:48 AM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi Igor,
>
> Thanks! I should have remembered that bit of SQL.
>
> Yes, if we can generalize `DESCRIBE`, we could create another path of some
> kind through the plugins that say, "return schema, not data."
>
> Then, for the HDF5 use case we could have:
>
> DESCRIBE TABLE `dfs`.`myFile.hdf5` -- returns schema
>
>
> And
>
> SELECT * FROM `dfs`.`myFile.hdf5` -- returns data
>
>
> Nice solution! I'll file a feature request.
>
>
> The next interesting bit about HDF5 is that it is a file system, it
> contains multiple data sets. Would be great to be able to express that in
> the FROM clause:
>
> SELECT * FROM `dfs`.`myFile.hdf5`.`dataSet1`
>
> From my random walks though Calcite, it appears that we can have any level
> of schema/table path. True? We'd need some way to resolve a name part to a
> file, then ask the format plugin for that file if it supports additional
> parts. This seems pretty obscure. Have we done anything like that before?
> Maybe in storage (rather than format) plugin?
>
> Thanks,
> - Paul
>
>
>
>     On Monday, February 17, 2020, 11:34:48 PM PST, Igor Guzenko <
> ihor.huzenko....@gmail.com> wrote:
>
>  Hello Paul,
>
> Seems like we simply need to improve our DESCRIBE [1] table functionality.
>
> [1] https://drill.apache.org/docs/describe/
>
> Thanks,
> Igor
>
> On Tue, Feb 18, 2020 at 9:23 AM Paul Rogers <par0...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> >
> > Charles has a little PR,  #1978, that adds a convenient feature to his
> > HDF5 format reader: the ability to query the schema of the file. (It
> seems
> > that HDF5 is a bit like a zip file: it contains a set of files. Unlike
> zip,
> > each file is a data set with a schema.) Charles added a clever way to
> tell
> > the reader that the user wants a schema rather than data.
> >
> > If we think a bit, we realize that a schema query would be handy for any
> > data source. Maybe I want to know the fields in a JSON or Parquet file
> > without getting the data for those fields (and, for example, inferring
> type
> > and nullability from data.)
> >
> > In a relational DB, we'd get the schema by querying system tables. We'd
> do
> > the same thing in Hive because Hive requires an up-front schema. But,
> Drill
> > is unique in that it can infer schema at run time; no previous schema
> > required. So, we have no system tables to answer schema questions.
> Instead,
> > we want to get the schema directly from the data source itself by doing a
> > query.
> >
> > (This feature would be in addition to the case when the Metastore does
> > hold a schema.)
> >
> >
> > How might we accomplish the same result? Can we create some kind of
> > "virtual" system table that tells us to rewrite the query to get schema?
> > Something like:
> >
> > SELECT * FROM sys.columns WHERE tableName = `dfs`.`my/path/someFile.json`
> >
> > Or, maybe some implied columns in the table schema?
> >
> >
> > SELECT schema.* FROM `dfs`.`my/path/someFile.json`
> >
> >
> > Or, maybe some special schema name space?
> >
> > SELECT schema.* FROM schema.`dfs`.`my/path/someFile.json`
> >
> >
> > Anyone know of any system that has an elegant solution we could mimic?
> > Other suggestions?
> >
> >
> > Thanks,
> > - Paul
> >
> >
>

Reply via email to