[jira] [Commented] (FLINK-6442) Extend TableAPI Support Sink Table Registration and ‘insert into’ Clause in SQL

ASF GitHub Bot (JIRA) Thu, 31 Aug 2017 20:18:16 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149979#comment-16149979
 ]


ASF GitHub Bot commented on FLINK-6442:
---------------------------------------

Github user lincoln-lil commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3829#discussion_r136491885
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
 ---
    @@ -502,26 +513,140 @@ abstract class TableEnvironment(val config: 
TableConfig) {
         *   tEnv.sql(s"SELECT * FROM $table")
         * }}}
         *
    -    * @param query The SQL query to evaluate.
    +    * @param sql The SQL string to evaluate.
         * @return The result of the query as Table.
         */
    -  def sql(query: String): Table = {
    +  @deprecated
    +  def sql(sql: String): Table = {
         val planner = new FlinkPlannerImpl(getFrameworkConfig, getPlanner, 
getTypeFactory)
         // parse the sql query
    -    val parsed = planner.parse(query)
    +    val parsed = planner.parse(sql)
         // validate the sql query
         val validated = planner.validate(parsed)
         // transform to a relational tree
         val relational = planner.rel(validated)
    -
         new Table(this, LogicalRelNode(relational.rel))
       }
     
       /**
    +    * Evaluates a SQL Select query on registered tables and retrieves the 
result as a
    +    * [[Table]].
    +    *
    +    * All tables referenced by the query must be registered in the 
TableEnvironment. But
    +    * [[Table.toString]] will automatically register an unique table name 
and return the
    +    * table name. So it allows to call SQL directly on tables like this:
    +    *
    +    * {{{
    +    *   val table: Table = ...
    +    *   // the table is not registered to the table environment
    +    *   tEnv.sqlSelect(s"SELECT * FROM $table")
    +    * }}}
    +    *
    +    * @param sql The SQL string to evaluate.
    +    * @return The result of the query as Table or null of the DML insert 
operation.
    +    */
    +  def sqlQuery(sql: String): Table = {
    +    val planner = new FlinkPlannerImpl(getFrameworkConfig, getPlanner, 
getTypeFactory)
    +    // parse the sql query
    +    val parsed = planner.parse(sql)
    +    if (null != parsed && parsed.getKind.belongsTo(SqlKind.QUERY)) {
    +      // validate the sql query
    +      val validated = planner.validate(parsed)
    +      // transform to a relational tree
    +      val relational = planner.rel(validated)
    +      new Table(this, LogicalRelNode(relational.rel))
    +    } else {
    +      throw new TableException(
    +        "Unsupported sql query! sqlQuery Only accept SELECT, UNION, 
INTERSECT, EXCEPT, VALUES, " +
    +          "WITH, ORDER_BY, EXPLICIT_TABLE")
    +    }
    +  }
    +
    +  /**
    +    * Evaluates a SQL statement which must be an SQL Data Manipulation 
Language (DML) statement,
    +    * such as INSERT, UPDATE or DELETE; or an SQL statement that returns 
nothing, such as a DDL
    +    * statement;
    +    * Currently only support a SQL INSERT statement on registered tables 
and has no return value.
    +    *
    +    * All tables referenced by the query must be registered in the 
TableEnvironment. But
    +    * [[Table.toString]] will automatically register an unique table name 
and return the
    +    * table name. So it allows to call SQL directly on tables like this:
    +    *
    +    * {{{
    +    *   /// register table sink for insertion
    +    *   tEnv.registerTableSink("target_table", ...
    +    *   val sourceTable: Table = ...
    +    *   // sourceTable is not registered to the table environment
    +    *   tEnv.sqlInsert(s"INSERT INTO target_table SELECT * FROM 
$sourceTable")
    +    * }}}
    +    *
    +    * @param sql The SQL String to evaluate.
    +    */
    +  def sqlUpdate(sql: String): Unit = {
    +    sqlUpdate(sql, QueryConfig.getQueryConfigFromTableEnv(this))
    +  }
    +
    +  /**
    +    * Evaluates a SQL statement which must be an SQL Data Manipulation 
Language (DML) statement,
    +    * such as INSERT, UPDATE or DELETE; or an SQL statement that returns 
nothing, such as a DDL
    +    * statement;
    +    * Currently only support a SQL INSERT statement on registered tables 
and has no return value.
    +    *
    +    * All tables referenced by the query must be registered in the 
TableEnvironment. But
    +    * [[Table.toString]] will automatically register an unique table name 
and return the
    +    * table name. So it allows to call SQL directly on tables like this:
    +    *
    +    * {{{
    +    *   /// register table sink for insertion
    +    *   tEnv.registerTableSink("target_table", ...
    +    *   val sourceTable: Table = ...
    +    *   // sourceTable is not registered to the table environment
    +    *   tEnv.sqlInsert(s"INSERT INTO target_table SELECT * FROM 
$sourceTable")
    +    * }}}
    +    *
    +    * @param sql The SQL String to evaluate.
    +    * @param config The [[QueryConfig]] to use.
    +    */
    +  def sqlUpdate(sql: String, config: QueryConfig): Unit = {
    +    val planner = new FlinkPlannerImpl(getFrameworkConfig, getPlanner, 
getTypeFactory)
    +    // parse the sql query
    +    val parsed = planner.parse(sql)
    +    parsed match {
    +      case insert: SqlInsert => {
    +        // validate the sql query
    +        planner.validate(parsed)
    +
    +        // validate sink table
    +        val targetName = 
insert.getTargetTable.asInstanceOf[SqlIdentifier].names.get(0)
    +        val targetTable = getTable(targetName)
    +        if (null == targetTable || 
!targetTable.isInstanceOf[TableSinkTable[_]]) {
    +          throw new TableException("SQL INSERT operation need a registered 
TableSink Table!")
    +        }
    +        // validate unsupported partial insertion to sink table
    +        val sinkTable = targetTable.asInstanceOf[TableSinkTable[_]]
    +        if (null != insert.getTargetColumnList && 
insert.getTargetColumnList.size() !=
    --- End diff --
    
    The fields must be in the same order and Calcite will not reorder fields 
based on their name.
    Current `insert into` implementation equivalent to such sql syntax:
    ```
    INSERT INTO table2
    SELECT * FROM table1 ... -- here * represents all columns declared by table2
    ```
    Another `insert into` is partial insert:
    ```
    INSERT INTO table2 (column1, column2, column3, ...)
    SELECT column1, column2, column3, ...
    FROM table1 ...
    ```
    column headers in the select clause are not used by an insert statement to 
match columns up.


> Extend TableAPI Support Sink Table Registration and ‘insert into’ Clause in 
> SQL
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-6442
>                 URL: https://issues.apache.org/jira/browse/FLINK-6442
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: lincoln.lee
>            Assignee: lincoln.lee
>            Priority: Minor
>
> Currently in TableAPI  there’s only registration method for source table,  
> when we use SQL writing a streaming job, we should add additional part for 
> the sink, like TableAPI does:
> {code}
> val sqlQuery = "SELECT * FROM MyTable WHERE _1 = 3"
> val t = StreamTestData.getSmall3TupleDataStream(env)
> tEnv.registerDataStream("MyTable", t)
> // one way: invoke tableAPI’s writeToSink method directly
> val result = tEnv.sql(sqlQuery)
> result.writeToSink(new YourStreamSink)
> // another way: convert to datastream first and then invoke addSink 
> val result = tEnv.sql(sqlQuery).toDataStream[Row]
> result.addSink(new StreamITCase.StringSink)
> {code}
> From the api we can see the sink table always be a derived table because its 
> 'schema' is inferred from the result type of upstream query.
> Compare to traditional RDBMS which support DML syntax, a query with a target 
> output could be written like this:
> {code}
> insert into table target_table_name
> [(column_name [ ,...n ])]
> query
> {code}
> The equivalent form of the example above is as follows:
> {code}
>     tEnv.registerTableSink("targetTable", new YourSink)
>     val sql = "INSERT INTO targetTable SELECT a, b, c FROM sourceTable"
>     val result = tEnv.sql(sql)
> {code}
> It is supported by Calcite’s grammar: 
> {code}
>  insert:( INSERT | UPSERT ) INTO tablePrimary
>  [ '(' column [, column ]* ')' ]
>  query
> {code}
> I'd like to extend Flink TableAPI to support such feature.  see design doc: 
> https://goo.gl/n3phK5



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-6442) Extend TableAPI Support Sink Table Registration and ‘insert into’ Clause in SQL

Reply via email to