[jira] [Created] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible

Fabian Hueske (JIRA) Tue, 05 Dec 2017 02:18:26 -0800

Fabian Hueske created FLINK-8203:
------------------------------------

             Summary: Make schema definition of DataStream/DataSet to Table 
conversion more flexible
                 Key: FLINK-8203
                 URL: https://issues.apache.org/jira/browse/FLINK-8203
             Project: Flink
          Issue Type: Bug
          Components: Table API & SQL
    Affects Versions: 1.4.0, 1.5.0
            Reporter: Fabian Hueske



When converting or registering a {{DataStream}} or {{DataSet}} as {{Table}}, 
the schema of the table can be defined (by default it is extracted from the 
{{TypeInformation}}.

The schema needs to be manually specified to select (project) fields, rename 
fields, or define time attributes. Right now, there are several limitations how 
the fields can be defined that also depend on the type of the {{DataStream}} / 
{{DataSet}}. Types with explicit field ordering (e.g., tuples, case classes, 
Row) require schema definition based on the position of fields. Pojo types 
which have no fixed order of fields, require to refer to fields by name. 
Moreover, there are several restrictions on how time attributes can be defined, 
e.g., event time attribute must replace an existing field or be appended and 
proctime attributes must be appended.

I think we can make the schema definition more flexible and provide two modes:

1. Reference input fields by name: All fields in the schema definition are 
referenced by name (and possibly renamed using an alias ({{as}}). In this mode, 
fields can be reordered and projected out. Moreover, we can define proctime and 
eventtime attributes at arbitrary positions using arbitrary names (except those 
that existing the result schema). This mode can be used for any input type, 
including POJOs. This mode is used if all field references exist in the input 
type.

2. Reference input fields by position: Field references might not refer to 
existing fields in the input type. In this mode, fields are simply renamed. 
Event-time attributes can replace the field on their position in the input data 
(if it is of correct type) or be appended at the end. Proctime attributes must 
be appended at the end. This mode can only be used if the input type has a 
defined field order (tuple, case class, Row).

We need to add more tests the check for all combinations of input types and 
schema definition modes.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible

Reply via email to