[ 
https://issues.apache.org/jira/browse/FLINK-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144312#comment-15144312
 ] 

ASF GitHub Bot commented on FLINK-3226:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1624#issuecomment-183251312
  
    Hi, I thought about using POJOs as native types within Table/SQL operators. 
IMO, the gains are very little compared to the added code complexity. Given a 
POJO input, we can preserve the input type only for very few operations such as 
a Filter. For most other operations, we need to generate a new output type 
(Tuple or Row). I am a bit skeptical about adding a lot of codeGen code with 
special cases for POJOs (such as the field index mapping) which is very seldom 
used. Moreover, POJO field accesses (for operations and serialization) go 
through reflection and are not very efficient. So even the performance gain for 
those few cases where POJOs can be used is not clear.
    
    I do not question the native type support in general. Tuples and primitives 
should definitely be supported, but I don't think we need to support POJOs 
within Table / SQL operators. Instead, I would convert POJO datasets into Row 
tables during table scan. Most of the code in this PR can be used to implement 
a codeGen'd converter Map function.
    
    What do you think @twalthr?



> Translate optimized logical Table API plans into physical plans representing 
> DataSet programs
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-3226
>                 URL: https://issues.apache.org/jira/browse/FLINK-3226
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API
>            Reporter: Fabian Hueske
>            Assignee: Chengxiang Li
>
> This issue is about translating an (optimized) logical Table API (see 
> FLINK-3225) query plan into a physical plan. The physical plan is a 1-to-1 
> representation of the DataSet program that will be executed. This means:
> - Each Flink RelNode refers to exactly one Flink DataSet or DataStream 
> operator.
> - All (join and grouping) keys of Flink operators are correctly specified.
> - The expressions which are to be executed in user-code are identified.
> - All fields are referenced with their physical execution-time index.
> - Flink type information is available.
> - Optional: Add physical execution hints for joins
> The translation should be the final part of Calcite's optimization process.
> For this task we need to:
> - implement a set of Flink DataSet RelNodes. Each RelNode corresponds to one 
> Flink DataSet operator (Map, Reduce, Join, ...). The RelNodes must hold all 
> relevant operator information (keys, user-code expression, strategy hints, 
> parallelism).
> - implement rules to translate optimized Calcite RelNodes into Flink 
> RelNodes. We start with a straight-forward mapping and later add rules that 
> merge several relational operators into a single Flink operator, e.g., merge 
> a join followed by a filter. Timo implemented some rules for the first SQL 
> implementation which can be used as a starting point.
> - Integrate the translation rules into the Calcite optimization process



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to