[jira] [Commented] (FLINK-3738) Refactor TableEnvironment and TranslationContext

Fabian Hueske (JIRA) Wed, 13 Apr 2016 02:46:09 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238958#comment-15238958
 ]


Fabian Hueske commented on FLINK-3738:
--------------------------------------

Hi [~yijieshen], that's a good observation. I would suggest to open a new JIRA 
for this issue. FLINK-3632 is somewhat related to this as well.

In general, it would be good to validate as early as possible, ideally when the 
RelNodes are constructed. This is not always possible with the current Table 
API. For instance, joins are defined by {{join()}} and join predicates are 
later added with {{where()}}. ATM, we do only allow equality joins for 
performance reasons but this can only be checked after optimization and when 
the DataSet program is constructed. 

However, I think it should be possible to move more checks to the API level. 
So, it would be good if you could open a JIRA (maybe with FLINK-3632 as a 
related or subissue) to refactor the query validation.

> Refactor TableEnvironment and TranslationContext
> ------------------------------------------------
>
>                 Key: FLINK-3738
>                 URL: https://issues.apache.org/jira/browse/FLINK-3738
>             Project: Flink
>          Issue Type: Task
>          Components: Table API
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
>
> Currently the TableAPI uses a static object called {{TranslationContext}} 
> which holds the Calcite table catalog and a Calcite planner instance. 
> Whenever a {{DataSet}} or {{DataStream}} is converted into a {{Table}} or 
> registered as a {{Table}} on the {{TableEnvironment}}, a new entry is added 
> to the catalog. The first time a {{Table}} is added, a planner instance is 
> created. The planner is used to optimize the query (defined by one or more 
> Table API operations and/or one ore more SQL queries) when a {{Table}} is 
> converted into a {{DataSet}} or {{DataStream}}. Since a planner may only be 
> used to optimize a single program, the choice of a single static object is 
> problematic.
> I propose to refactor the {{TableEnvironment}} to take over the 
> responsibility of holding the catalog and the planner instance. 
> - A {{TableEnvironment}} holds a catalog of registered tables and a single 
> planner instance.
> - A {{TableEnvironment}} will only allow to translate a single {{Table}} 
> (possibly composed of several Table API operations and SQL queries) into a 
> {{DataSet}} or {{DataStream}}. 
> - A {{TableEnvironment}} is bound to an {{ExecutionEnvironment}} or a 
> {{StreamExecutionEnvironment}}. This is necessary to create data source or 
> source functions to read external tables or streams.
> - {{DataSet}} and {{DataStream}} need a reference to a {{TableEnvironment}} 
> to be converted into a {{Table}}. This will prohibit implicit casts as 
> currently supported for the DataSet Scala API.
> - A {{Table}} needs a reference to the {{TableEnvironment}} it is bound to. 
> Only tables from the same {{TableEnvironment}} can be processed together.
> - The {{TranslationContext}} will be completely removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3738) Refactor TableEnvironment and TranslationContext

Reply via email to