[jira] [Updated] (CALCITE-4864) Supports Polymorphic Table function

Jing Zhang (Jira) Tue, 28 Jun 2022 23:42:05 -0700

Title: Message Title

Jing Zhang updated an issue

Change By:	Jing Zhang

h1. Background
h2. What is Polymorphic Table Function makes table function to be polymorphic

A Polymorphic Table Function is a introduced in SQL 2017 standard. It requires table function that returns could take and return tables with row type not declared at design time.

In fact, table functions in Calcite already supports this function more or less. For example [Tumble/Hop/Session Window table function |https://calcite.apache.org/docs/reference.html#table-functions] could take a table whose parameter which row type is not declared when the function at design time, and generate a return table which row type is created not declared, either . It is introduced However, we hadn't thought about how to support this feature systematically in SQL 2017 standard the CALCITE .
h2. Why we need Polymorphic Table Function

Polymorphic Table Function is Therefore, the evolution of current implementation misses some points and conflicts with the SQL standard, for example, for [Session Window table function |https://calcite . It has the following features compare with normal apache.org/docs/reference.html# table function -functions] :
# may have generic table parameters {code:java}
SELECT * FROM TABLE ( i.e.
  SESSION(
    TABLE orders , no row type of table parameters declared when the PTF is created
    DESCRIPTOR(rowtime ) ,
# returns     DESCRIPTOR(product),
    INTERVAL '20' MINUTE));
{code}

But, a desired way from SQL standard 2017 Polymorphic table whose row type functions is not declared when the function is created to use `PARTITION BY` clause to replace KEY DESCRIPTOR.
{code:java}
SELECT * FROM TABLE(
  SESSION(
    TABLE orders PARTITION BY product , it may depend on the
    DESCRIPTOR(rowtime),
    INTERVAL '20' MINUTE));
{code}
I propose to support polymorphic table function arguments in a better way .
h2. Row semantics and set semantics
It
Table function could have generic table parameters. And input tables table parameters are classified by three characteristics:
# Input tables have either row semantics or set semantics)
## Row semantics means that the the result of the function is determined only by looking at the current row
## Set semantics means that the result of the function can be determined by looking at the current row and some state “summarized” from previously processed rows. The table argument with set semantics can optionally be extended with either a {{PARTITION BY}} clause or an {{ORDER BY}} clause or both.
# The second characteristic, which applies only to input tables with set semantics, is whether the function can generate a result row even if the input table is empty.
# The third characteristic is whether the input table supports pass-through columns or not. Pass-through columns is a mechanism enabling the function to copy every column of an input row into columns of an output row.

h2. Examples of Polymorphic Table Function

There are many interesting examples of polymorphic table function in SQL standard.

Here I would list two cases, one is a table function with row semantics input table, the other one is a table function with set semantics input table.
h3. Table function with contains a Row Semantics input table as parameter with row semantics

We often need to read a CSV file, generally, the first line of the file contains a list of column names, and subsequent lines of the file contain data. The data in general can be treated as a large VARCHAR. However, some of the fields may be numeric or datetime.
A PTF named CSVreader is designed to read a file of comma-separated values and interpret this file as a table.

CSVreader function has three parameters:
# The first parameter, File, is the name of a file on the query author's system. This file must contain the comma-separated values that are to be converted to a table. The first line of the file contains the names of the resulting columns. Succeeding lines contain the data. Each line after the first will result in one row of output, with column names as determined by the first line of the input.
# Floats is a PTF descriptor area, which should provide a list of the column names that are to be interpreted numerically. These columns will be output with the data type FLOAT.
# Dates is a PTF descriptor area, which provides a list of the column names that are to be interpreted as datetimes. These columns will be output with the data type DATE.

How to use this table function in query?

For example, there is a csv file named abc.csv with the following contents:
{code:java}
docno,name,due_date,principle,interest
123,Mary,01/01/2014,234.56,345.67
234,Edgar,01/01/2014,654.32,543.21 {code}

the query author may write a query such as the following:
{code:java}
SELECT *
FROM TABLE (
  CSVreader (
    File => 'abc.csv',
    Floats => DESCRIPTOR ("principle", "interest")
    Dates => DESCRIPTOR ("due_date")
  )
) AS S {code}

The result will be
||docno||name||due_date||principle||interest||
|123|Mary|01/01/2014|234.56|345.67|
|234|Edgar|01/01/2014|654.32|543.21|

h3. Table function with contains a Row Semantics input table as parameter with set semantics

TopN takes an input table that has been sorted on a numeric column. It copies the first n rows through to the output table. Any additional rows are summarized in a single output row in which the sort column has been summed and all other columns are null.
TopN function has two parameters:
# The first parameter, Input, is the input table. This table has set semantics, meaning that the result depends on the set of data (since the last row is a summary row). In addition, the table is marked as PRUNE WHEN EMPTY, meaning that the result is necessarily empty if the input is empty. The query author must order this input table on a single numeric column (syntax below).

# The second parameter, Howmany, specifies how many input rows that the user wants to be copied into the output table; all rows after this will contribute to the final summary row in the output.

How to use this function in query?

For example, if the contents of source table orders are:
||region||product||sales||
|East|A|1234.56|
|East|B|987.65|
|East|C|876.54|
|East|D|765.43|
|East|E|654.32|
|West|E|2345.67|
|West|D|2001.33|
|West|C|1357.99|
|West|B|975.35|
|West|A|864,22 |

the query author may write a query such as the following:
{code:java}
SELECT *
FROM TABLE(
  topn(
    Input => TABLE orders PARTITION BY region ORDER BY sales desc,
    Howmany => 3
  )
){code}

The result will be
||region||product||sales||
|East|A|1234.56|
|East|B|987.65|
|East|C|876.54|
|West|E|2345.67|
|West|D|2001.33|
|West|C|1357.99|
h3.

Add Comment

This message was sent by Atlassian Jira

[jira] [Updated] (CALCITE-4864) Supports Polymorphic Table function

Reply via email to