[
https://issues.apache.org/jira/browse/FLINK-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312864#comment-16312864
]
ASF GitHub Bot commented on FLINK-8203:
---------------------------------------
Github user twalthr commented on a diff in the pull request:
https://github.com/apache/flink/pull/5132#discussion_r159844472
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
---
@@ -768,6 +768,39 @@ abstract class TableEnvironment(val config:
TableConfig) {
frameworkConfig
}
+ /**
+ * Reference input fields by name:
+ * All fields in the schema definition are referenced by name
+ * (and possibly renamed using an alias (as). In this mode, fields can
be reordered and
+ * projected out. Moreover, we can define proctime and rowtime
attributes at arbitrary
+ * positions using arbitrary names (except those that exist in the
result schema). This mode
+ * can be used for any input type, including POJOs.
+ *
+ * Reference input fields by position:
+ * Field references must refer to existing fields in the input type
(except for
+ * renaming with alias (as)). In this mode, fields are simply renamed.
Event-time attributes can
+ * replace the field on their position in the input data (if it is of
correct type) or be
+ * appended at the end. Proctime attributes must be appended at the
end. This mode can only be
+ * used if the input type has a defined field order (tuple, case class,
Row) and no of fields
+ * references a field of the input type.
+ */
+ protected def isReferenceByPosition(t: TypeInformation[_], fields:
Array[Expression]): Boolean = {
+ if (t.isInstanceOf[PojoTypeInfo[_]]) {
+ return false
+ }
+
+ val inputNames = t match {
+ case ct: CompositeType[_] => ct.getFieldNames
+ case _ => return false // atomic types are references by name
+ }
+
+ // use the by position mode if no of the fields exists in the input
+ fields.forall {
+ case UnresolvedFieldReference(name) => !inputNames.contains(name)
--- End diff --
Yes, that was my main concern behind this. This makes the difference
between by-ref and by-pos more explicit. In this case I think it is better to
fail instead of having implicit behavior that a user might not have intended. I
will add a comment about this in the code.
> Make schema definition of DataStream/DataSet to Table conversion more flexible
> ------------------------------------------------------------------------------
>
> Key: FLINK-8203
> URL: https://issues.apache.org/jira/browse/FLINK-8203
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Fabian Hueske
> Assignee: Timo Walther
>
> When converting or registering a {{DataStream}} or {{DataSet}} as {{Table}},
> the schema of the table can be defined (by default it is extracted from the
> {{TypeInformation}}.
> The schema needs to be manually specified to select (project) fields, rename
> fields, or define time attributes. Right now, there are several limitations
> how the fields can be defined that also depend on the type of the
> {{DataStream}} / {{DataSet}}. Types with explicit field ordering (e.g.,
> tuples, case classes, Row) require schema definition based on the position of
> fields. Pojo types which have no fixed order of fields, require to refer to
> fields by name. Moreover, there are several restrictions on how time
> attributes can be defined, e.g., event time attribute must replace an
> existing field or be appended and proctime attributes must be appended.
> I think we can make the schema definition more flexible and provide two modes:
> 1. Reference input fields by name: All fields in the schema definition are
> referenced by name (and possibly renamed using an alias ({{as}}). In this
> mode, fields can be reordered and projected out. Moreover, we can define
> proctime and eventtime attributes at arbitrary positions using arbitrary
> names (except those that existing the result schema). This mode can be used
> for any input type, including POJOs. This mode is used if all field
> references exist in the input type.
> 2. Reference input fields by position: Field references might not refer to
> existing fields in the input type. In this mode, fields are simply renamed.
> Event-time attributes can replace the field on their position in the input
> data (if it is of correct type) or be appended at the end. Proctime
> attributes must be appended at the end. This mode can only be used if the
> input type has a defined field order (tuple, case class, Row).
> We need to add more tests the check for all combinations of input types and
> schema definition modes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)