[jira] [Updated] (CALCITE-4864) Supports Polymorphic Table function

Jing Zhang (Jira) Sun, 26 Jun 2022 08:20:05 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jing Zhang updated CALCITE-4864:
--------------------------------
    Description: 
h1. Background
h2. What is Polymorphic Table Function

A Polymorphic Table Function (abbreviated PTF) is a function that returns a 
table whose row type is not declared when the function is created. It is 
introduced in SQL 2017 standard.
h2. Why we need PTF

PTF is the evolution of the table function. It has the following features 
compare with normal table function:
 # may have generic table parameters (i.e., no row type of table parameters 
declared when the PTF is created)
 # returns a table whose row type is not declared when the function is created, 
it may depend on the function arguments.

PTF could have generic table parameters. And  input tables are classified by 
three characteristics:
 # Input tables have either row semantics or set semantics)
 ## Row semantics means that the the result of the PTF is determined only by 
looking at the current row
 ## Set semantics means that the result of the PTF can be determined by looking 
at the current row and some state “summarized” from previously processed rows.
 # The second characteristic, which applies only to input tables with set 
semantics, is whether the PTF can generate a result row even if the input table 
is empty.
 # The third characteristic is whether the input table supports pass-through 
columns or not. Pass-through columns is a mechanism enabling the PTF to copy 
every column of an input row into columns of an output row.

h2. Examples of PTF

There are many interesting examples of PTF in SQL standard.

Here I would list two cases, one is a PTF with row semantics input table, the 
other one is a PTF with set semantics input table.
h3. PTF with a Row Semantics input table as parameter


We often need to read a CSV file, generally, the first line of the file 
contains a list of column names, and subsequent lines of the file contain data. 
The data in general can be treated as a large VARCHAR. However, some of the 
fields may be numeric or datetime.
A PTF named CSVreader is designed to read a file of comma-separated values and 
interpret this file as a table.

CSVreader function has three parameters:
 # The first parameter, File, is the name of a file on the query author's 
system. This file must contain the comma-separated values that are to be 
converted to a table. The first line of the file contains the names of the 
resulting columns. Succeeding lines contain the data. Each line after the first 
will result in one row of output, with column names as determined by the first 
line of the input.
 # Floats is a PTF descriptor area, which should provide a list of the column 
names that are to be interpreted numerically. These columns will be output with 
the data type FLOAT.
 # Dates is a PTF descriptor area, which provides a list of the column names 
that are to be interpreted as datetimes. These columns will be output with the 
data type DATE.

How to use this PTF in query?

For example, suppose that the contents of abc.csv are
{code:java}
docno,name,due_date,principle,interest
123,Mary,01/01/2014,234.56,345.67
234,Edgar,01/01/2014,654.32,543.21 {code}
 
the query author may write a query such as the following:
{code:java}
SELECT *
FROM TABLE ( 
  CSVreader ( 
    File => 'abc.csv',
    Floats => DESCRIPTOR ("principle", "interest")
    Dates => DESCRIPTOR ("due_date")
  ) 
) AS S {code}
 

The result will be
||docno||name||due_date||principle||interest||
|123|Mary|01/01/2014|234.56|345.67|
|234|Edgar|01/01/2014|654.32|543.21|

 
h3. PTF with a Row Semantics input table as parameter
TopN takes an input table that has been sorted on a numeric column. It copies 
the first n rows through to the output table. Any additional rows are 
summarized in a single output row in which the sort column has been summed and 
all other columns are null.
The PTF has two parameters:
 # The first parameter, Input, is the input table. This table has set 
semantics, meaning that the result depends on the set of data (since the last 
row is a summary row). In addition, the table is marked as PRUNE WHEN EMPTY, 
meaning that the result is necessarily empty if the input is empty. The query 
author must order this input table on a single numeric column (syntax below).

 # The second parameter, Howmany, specifies how many input rows that the user 
wants to be copied into the output table; all rows after this will contribute 
to the final summary row in the output.

How to use this PTF in query?

For example, suppose that the contents of abc.csv are
{code:java}
region,product,sales
East,A,1234.56
East,B,987.65
East,C,876.54
East,D,765.43
East,E,654.32
West,E,2345.67
West,D,2001.33
West,C,1357.99
West,B,975.35
West,A,864,22 {code}
 
the query author may write a query such as the following:
{code:java}
SELECT *
FROM TABLE(
  topn(
    Input => TABLE orders PARTITION BY region ORDER BY sales desc,
    Howmany => 3
  )
){code}
 

The result will be
||region||product||sales||
|East|A|1234.56|
|East|B|987.65|
|East|C|876.54|
|West|E|2345.67|
|West|D|2001.33|
|West|C|1357.99|
h3.  

  was:
h1. Background
h2. What is Polymorphic Table Function

A Polymorphic Table Function (abbreviated PTF) is a function that returns a 
table whose row type is not declared when the function is created. It is 
introduced in SQL 2017 standard.

 
h2. When y

PTF is the evolution of the table function. It has the following features 
compare with normal table function:
 # may have generic table parameters (i.e., no row type of table parameters 
declared when the PTF is created)
 # returns a table whose row type is not declared when the function is created, 
it may depend on the function arguments.

PTF could have generic table parameters. And  input tables are classified by 
three characteristics:
 # Input tables have either row semantics or set semantics)
 ## Row semantics means that the the result of the PTF is determined only by 
looking at the current row
 ## Set semantics means that the result of the PTF can be determined by looking 
at the current row and some state “summarized” from previously processed rows.
 # The second characteristic, which applies only to input tables with set 
semantics, is whether the PTF can generate a result row even if the input table 
is empty.
 # The third characteristic is whether the input table supports pass-through 
columns or not. Pass-through columns is a mechanism enabling the PTF to copy 
every column of an input row into columns of an output row.

h2. Examples

There are many interesting examples of PTF in SQL standard. Here I only give 
one case.
We often need to read a CSV file, generally, the first line of the file 
contains a list of column names, and subsequent lines of the file contain data. 
The data in general can be treated as a large VARCHAR. However, some of the 
fields may be numeric or datetime.
A PTF named CSVreader is designed to read a file of comma-separated values and 
interpret this file as a table.

CSVreader function has three parameters:
 # The first parameter, File, is the name of a file on the query author's 
system. This file must contain the comma-separated values that are to be 
converted to a table. The first line of the file contains the names of the 
resulting columns. Succeeding lines contain the data. Each line after the first 
will result in one row of output, with column names as determined by the first 
line of the input.
 # Floats is a PTF descriptor area, which should provide a list of the column 
names that are to be interpreted numerically. These columns will be output with 
the data type FLOAT.
 # Dates is a PTF descriptor area, which provides a list of the column names 
that are to be interpreted as datetimes. These columns will be output with the 
data type DATE.

How to use this PTF in query?

For example, suppose that the contents of abc.csv are
{code:java}
docno,name,due_date,principle,interest
123,Mary,01/01/2014,234.56,345.67
234,Edgar,01/01/2014,654.32,543.21 {code}
 
the query author may write a query such as the following:
{code:java}
SELECT *
FROM TABLE ( 
  CSVreader ( File => 'abc.csv',
  Floats => DESCRIPTOR ("principle", "interest")
  Dates => DESCRIPTOR ("due_date") ) ) AS S {code}
 

The result will be
||docno||name||due_date||principle||interest||
|123|Mary|01/01/2014|234.56|345.67|
|234|Edgar|01/01/2014|654.32|543.21|

 


> Supports Polymorphic Table function
> -----------------------------------
>
>                 Key: CALCITE-4864
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4864
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jing Zhang
>            Priority: Major
>
> h1. Background
> h2. What is Polymorphic Table Function
> A Polymorphic Table Function (abbreviated PTF) is a function that returns a 
> table whose row type is not declared when the function is created. It is 
> introduced in SQL 2017 standard.
> h2. Why we need PTF
> PTF is the evolution of the table function. It has the following features 
> compare with normal table function:
>  # may have generic table parameters (i.e., no row type of table parameters 
> declared when the PTF is created)
>  # returns a table whose row type is not declared when the function is 
> created, it may depend on the function arguments.
> PTF could have generic table parameters. And  input tables are classified by 
> three characteristics:
>  # Input tables have either row semantics or set semantics)
>  ## Row semantics means that the the result of the PTF is determined only by 
> looking at the current row
>  ## Set semantics means that the result of the PTF can be determined by 
> looking at the current row and some state “summarized” from previously 
> processed rows.
>  # The second characteristic, which applies only to input tables with set 
> semantics, is whether the PTF can generate a result row even if the input 
> table is empty.
>  # The third characteristic is whether the input table supports pass-through 
> columns or not. Pass-through columns is a mechanism enabling the PTF to copy 
> every column of an input row into columns of an output row.
> h2. Examples of PTF
> There are many interesting examples of PTF in SQL standard.
> Here I would list two cases, one is a PTF with row semantics input table, the 
> other one is a PTF with set semantics input table.
> h3. PTF with a Row Semantics input table as parameter
> We often need to read a CSV file, generally, the first line of the file 
> contains a list of column names, and subsequent lines of the file contain 
> data. The data in general can be treated as a large VARCHAR. However, some of 
> the fields may be numeric or datetime.
> A PTF named CSVreader is designed to read a file of comma-separated values 
> and interpret this file as a table.
> CSVreader function has three parameters:
>  # The first parameter, File, is the name of a file on the query author's 
> system. This file must contain the comma-separated values that are to be 
> converted to a table. The first line of the file contains the names of the 
> resulting columns. Succeeding lines contain the data. Each line after the 
> first will result in one row of output, with column names as determined by 
> the first line of the input.
>  # Floats is a PTF descriptor area, which should provide a list of the column 
> names that are to be interpreted numerically. These columns will be output 
> with the data type FLOAT.
>  # Dates is a PTF descriptor area, which provides a list of the column names 
> that are to be interpreted as datetimes. These columns will be output with 
> the data type DATE.
> How to use this PTF in query?
> For example, suppose that the contents of abc.csv are
> {code:java}
> docno,name,due_date,principle,interest
> 123,Mary,01/01/2014,234.56,345.67
> 234,Edgar,01/01/2014,654.32,543.21 {code}
>  
> the query author may write a query such as the following:
> {code:java}
> SELECT *
> FROM TABLE ( 
>   CSVreader ( 
>     File => 'abc.csv',
>     Floats => DESCRIPTOR ("principle", "interest")
>     Dates => DESCRIPTOR ("due_date")
>   ) 
> ) AS S {code}
>  
> The result will be
> ||docno||name||due_date||principle||interest||
> |123|Mary|01/01/2014|234.56|345.67|
> |234|Edgar|01/01/2014|654.32|543.21|
>  
> h3. PTF with a Row Semantics input table as parameter
> TopN takes an input table that has been sorted on a numeric column. It copies 
> the first n rows through to the output table. Any additional rows are 
> summarized in a single output row in which the sort column has been summed 
> and all other columns are null.
> The PTF has two parameters:
>  # The first parameter, Input, is the input table. This table has set 
> semantics, meaning that the result depends on the set of data (since the last 
> row is a summary row). In addition, the table is marked as PRUNE WHEN EMPTY, 
> meaning that the result is necessarily empty if the input is empty. The query 
> author must order this input table on a single numeric column (syntax below).
>  # The second parameter, Howmany, specifies how many input rows that the user 
> wants to be copied into the output table; all rows after this will contribute 
> to the final summary row in the output.
> How to use this PTF in query?
> For example, suppose that the contents of abc.csv are
> {code:java}
> region,product,sales
> East,A,1234.56
> East,B,987.65
> East,C,876.54
> East,D,765.43
> East,E,654.32
> West,E,2345.67
> West,D,2001.33
> West,C,1357.99
> West,B,975.35
> West,A,864,22 {code}
>  
> the query author may write a query such as the following:
> {code:java}
> SELECT *
> FROM TABLE(
>   topn(
>     Input => TABLE orders PARTITION BY region ORDER BY sales desc,
>     Howmany => 3
>   )
> ){code}
>  
> The result will be
> ||region||product||sales||
> |East|A|1234.56|
> |East|B|987.65|
> |East|C|876.54|
> |West|E|2345.67|
> |West|D|2001.33|
> |West|C|1357.99|
> h3.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (CALCITE-4864) Supports Polymorphic Table function

Reply via email to