[jira] [Commented] (FLINK-2828) Add interfaces for Table API input formats

ASF GitHub Bot (JIRA) Tue, 26 Apr 2016 07:45:35 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258199#comment-15258199
 ]


ASF GitHub Bot commented on FLINK-2828:
---------------------------------------

Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1939#discussion_r61098440
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -67,6 +67,165 @@ The central concept of the Table API is a `Table` which 
represents a table with
     
     The following sections show by example how to use the Table API embedded 
in the Scala and Java DataSet APIs.
     
    +### Registering Tables to and Accessing Tables from TableEnvironments
    +
    +`TableEnvironment`s have an internal table catalog to which tables can be 
registered with a unique name. After registration, a table can be accessed from 
the `TableEnvironment` by its name. Tables can be registered in different ways.
    +
    +#### Register a DataSet
    +
    +A `DataSet` is registered as a `Table` in a `BatchTableEnvironment` as 
follows:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
    +
    +// register the DataSet cust as table "Customers" with fields derived from 
the dataset
    +tableEnv.registerDataSet("Customers", cust)
    +
    +// register the DataSet ord as table "Orders" with fields user, product, 
and amount
    +tableEnv.registerDataSet("Orders", ord, "user, product, amount");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +val tableEnv = TableEnvironment.getTableEnvironment(env)
    +
    +// register the DataSet cust as table "Customers" with fields derived from 
the dataset
    +tableEnv.registerDataSet("Customers", cust)
    +
    +// register the DataSet ord as table "Orders" with fields user, product, 
and amount
    +tableEnv.registerDataSet("Orders", ord, 'user, 'product, 'amount)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +*Note: DataSet table names are not allowed to follow the 
`^_DataSetTable_[0-9]+` pattern, as these are reserved for internal use only.*
    +
    +#### Register a DataStream
    +
    +A `DataStream` is registered as a `Table` in a `StreamTableEnvironment` as 
follows:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    +StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
    +
    +// register the DataStream cust as table "Customers" with fields derived 
from the datastream
    +tableEnv.registerDataStream("Customers", cust)
    +
    +// register the DataStream ord as table "Orders" with fields user, 
product, and amount
    +tableEnv.registerDataStream("Orders", ord, "user, product, amount");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val env = StreamExecutionEnvironment.getExecutionEnvironment
    +val tableEnv = TableEnvironment.getTableEnvironment(env)
    +
    +// register the DataStream cust as table "Customers" with fields derived 
from the datastream
    +tableEnv.registerDataStream("Customers", cust)
    +
    +// register the DataStream ord as table "Orders" with fields user, 
product, and amount
    +tableEnv.registerDataStream("Orders", ord, 'user, 'product, 'amount)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +*Note: DataStream table names are not allowed to follow the 
`^_DataStreamTable_[0-9]+` pattern, as these are reserved for internal use 
only.*
    +
    +#### Register a Table
    +
    +A `Table` that originates from a Table API operation or a SQL query is 
registered in a `TableEnvironemnt` as follows:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// works for StreamExecutionEnvironment identically
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
    +
    +// convert a DataSet into a Table
    +Table custT = tableEnv
    +  .toTable(custDs, "name, zipcode")
    +  .where("zipcode = '12345'")
    +  .select("name")
    +
    +// register the Table custT as table "custNames"
    +tableEnv.registerTable("custNames", custT)
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +// works for StreamExecutionEnvironment identically
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +val tableEnv = TableEnvironment.getTableEnvironment(env)
    +
    +// convert a DataSet into a Table
    +val custT = custDs
    +  .toTable(tableEnv, 'name, 'zipcode)
    +  .where('zipcode === "12345")
    +  .select('name)
    +
    +// register the Table custT as table "custNames"
    +tableEnv.registerTable("custNames", custT)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +A registered `Table` that originates from a Table API operation or SQL 
query is treated similarly as a view as known from relational DBMS, i.e., it 
can be inlined when optimizing the query.
    +
    +#### Register an external table using a TableSource
    --- End diff --
    
    Capitalize Table to be consistent with previous titles?


> Add interfaces for Table API input formats
> ------------------------------------------
>
>                 Key: FLINK-2828
>                 URL: https://issues.apache.org/jira/browse/FLINK-2828
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>            Reporter: Timo Walther
>            Assignee: Fabian Hueske
>
> In order to support input formats for the Table API, interfaces are 
> necessary. I propose two types of TableSources:
> - AdaptiveTableSources can adapt their output to the requirements of the 
> plan. Although the output schema stays the same, the TableSource can react on 
> field resolution and/or predicates internally and can return adapted 
> DataSet/DataStream versions in the "translate" step.
> - StaticTableSources are an easy way to provide the Table API with additional 
> input formats without much implementation effort (e.g. for fromCsvFile())
> TableSources need to be deeply integrated into the Table API.
> The TableEnvironment requires a newly introduced AbstractExecutionEnvironment 
> (common super class of all ExecutionEnvironments for DataSets and 
> DataStreams).
> Here's what a TableSource can see from more complicated queries:
> {code}
> getTableJava(tableSource1)
>   .filter("a===5 || a===6")
>   .select("a as a4, b as b4, c as c4")
>   .filter("b4===7")
>   .join(getTableJava(tableSource2))
>   .where("a===a4 && c==='Test' && c4==='Test2'")
> // Result predicates for tableSource1:
> //  List("a===5 || a===6", "b===7", "c==='Test2'")
> // Result predicates for tableSource2:
> //  List("c==='Test'")
> // Result resolved fields for tableSource1 (true = filtering, 
> false=selection):
> //  Set(("a", true), ("a", false), ("b", true), ("b", false), ("c", false), 
> ("c", true))
> // Result resolved fields for tableSource2 (true = filtering, 
> false=selection):
> //  Set(("a", true), ("c", true))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2828) Add interfaces for Table API input formats

Reply via email to