Re: Possible way to specify column types in query

Arina Yelchiyeva Mon, 01 Oct 2018 10:55:17 -0700

Currently Calcite supports the following syntax, apparently used in Phoenix.
*select empno + x from EMP_MODIFIABLEVIEW extend (x int not null)*


Another option to consider is hint syntax (many DBs use this one) basically
it's a multiline comment followed by a plus:
*select /*+.....*/ col_name from t*
This would allow us to pass not only schema but join / index hints etc.

Example:
*select /*+ SCHEMA(a int not null, b int) */ a from t*

One minus we would need to implement this first in Calcite if Calcite
community would be in favor of such changes.

Kind regards,
Arina

On Mon, Sep 10, 2018 at 7:42 AM Paul Rogers <[email protected]>
wrote:

> Hi Weijie,
>
> Thanks for the paper pointer. F1 uses the same syntax as Scope (the system
> cited in my earlier note): data type after the name.
>
> Another description is [1]. Neither paper describe how F1 handles arrays.
> However, this second paper points out that Protobuf is F1's native format,
> and so F1 has support for nested types. Drill does also, but in Drill, a
> reference to "customer.phone.cell" cause the nested "cell" column to be
> projected as a top-level column. And, neither paper say whether F1 is used
> with O/JDBC, and if so, how they handle the mapping from nested types to
> the flat tuple structure required by xDBC.
>
> Have you come across these details?
>
> Thanks,
> - Paul
>
>
>
>     On Thursday, September 6, 2018, 8:43:57 PM PDT, weijie tong <
> [email protected]> wrote:
>
>  Google's latest paper about F1[1] claims to support any data sources by
> using an extension api called TVF see section 6.3. Also need to declare
> column datatype before the query.
>
>
> [1] http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf
>
> On Fri, Sep 7, 2018 at 9:47 AM Paul Rogers <[email protected]>
> wrote:
>
> > Hi All,
> >
> > We've discussed quite a few times whether Drill should or should not
> > support or require schemas, and if so, how the user might express the
> > schema.
> >
> > I came across a paper [1] that suggests a simple, elegant SQL extension:
> >
> > EXTRACT <column>[:<type>] {,<column>[:<type>]}
> > FROM <stream_name>
> >
> > Paraphrasing into Drill's SQL:
> >
> > SELECT <column>[:<type>][AS <alias>] {,<column>[:<type>][AS <alias>]}
> > FROM <table_name>
> >
> > Have a collection of JSON files in which string column `foo` appears in
> > only half the files? Don't want to get schema conflicts with VARCHAR and
> > nullable INT? Just do:
> >
> > SELECT name:VARCHAR, age:INT, foo:VARCHAR
> > FROM `my-dir` ...
> >
> > Not only can the syntax be used to specify the "natural" type for a
> > column, it might also specify a preferred type. For example. "age:INT"
> says
> > that "age" is an INT, even though JSON would normally parse it as a
> BIGINT.
> > Similarly, using this syntax is a easy way to tell Drill how to convert
> CSV
> > columns from strings to DATE, INT, FLOAT, etc. without the need for CAST
> > functions. (CAST functions read the data in one format, then convert it
> to
> > another in a Project operator. Using a column type might let the reader
> do
> > the conversion -- something that is easy to implement if using the
> "result
> > set loader" mechanism.)
> >
> > Plus, the syntax fits nicely into the existing view file structure. If
> the
> > types appear in views, then client tools can continue to use standard SQL
> > without the type information.
> >
> > When this idea came up in the past, someone mentioned the issue of
> > nullable vs. non-nullable. (Let's also include arrays, since Drill
> supports
> > that. Maybe add a suffix to the the name:
> >
> > SELECT req:VARCHAR NOT NULL, opt:INT NULL, arr:FLOAT[] FROM ...
> >
> > Not pretty, but works with the existing SQL syntax rules.
> >
> > Obviously, Drill has much on its plate, so not suggestion that Drill
> > should do this soon. Just passing it along as yet another option to
> > consider.
> >
> > Thanks,
> > - Paul
> >
> > [1] http://www.cs.columbia.edu/~jrzhou/pub/Scope-VLDBJ.pdf
>

Re: Possible way to specify column types in query

Reply via email to