[
https://issues.apache.org/jira/browse/FLINK-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930960#comment-15930960
]
ASF GitHub Bot commented on FLINK-6097:
---------------------------------------
Github user sunjincheng121 commented on the issue:
https://github.com/apache/flink/pull/3560
HI, @KurtYoung Thanks for your attention to this PR. Good question, Here I
glad share why I notice this method:
When we try to implement OVER window TableAPI, The first version of the
prototype to achieve,we do not consider the table field will be out of order
when we implement translateToPlan method,then we set outputRow field from
inputRow according to the Initial order of the table field index.
At the beginning, the projections in the select statement less than 5
columns It works well.But Unfortunately when the count of projections bigger
than 4 (>=5), we got the random result. Then we debug the code, we find that
ProjectionTranslator # identifyFieldReferences method uses theSet temporary
save field, when the number of elements in the Set is less than 5, the Set
takes the Se1, Se2, Se3, Se4 data structures. When the number of elements is
greater than or equal to 5, the Set takes HashSet # HashTrieSet and which will
cause the data to be out of order. So we thought 2 approach to solve this
problem:
Let ProjectionTranslator # identifyFieldReferences method guaranteed the
order of the extracted field references same as input order.
We add the input and output field mapping.
At last we using approach#2 solve the problem. This change is not necessary
for the problem i have faced. But I feel it is better to let the output of this
method in the same order as the input, it may be very helpful for other cases,
though I am currently not aware of any. I am ok with not making this change,
but we should add a comment instead to highlight that the potential output of
the current output. Otherwise, some people may not pay attention to this and
assume it is in order.
Thanks,
SunJincheng
> Guaranteed the order of the extracted field references
> ------------------------------------------------------
>
> Key: FLINK-6097
> URL: https://issues.apache.org/jira/browse/FLINK-6097
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Reporter: sunjincheng
> Assignee: sunjincheng
>
> When we try to implement `OVER window` TableAPI, The first version of the
> prototype to achieve,we do not consider the table field will be out of order
> when we implement `translateToPlan` method,then we set `outputRow` field
> from `inputRow` according to the Initial order of the table field index.
> At the beginning, the projections in the select statement less than 5 columns
> It works well.But Unfortunately when the count of projections bigger than 4
> (>=5), we got the random result. Then we debug the code, we find that
> `ProjectionTranslator # identifyFieldReferences` method uses the` Set`
> temporary save field, when the number of elements in the Set is less than 5,
> the Set takes the Se1, Se2, Se3, Se4 data structures. When the number of
> elements is greater than or equal to 5, the Set takes HashSet # HashTrieSet
> and which will cause the data to be out of order.
> e.g.:
> Add the following elements in turn:
> {code}
> A, b, c, d, e
> Set (a)
> Class scala.collection.immutable.Set $ Set1
> Set (a, b)
> Class scala.collection.immutable.Set $ Set2
> Set (a, b, c)
> Class scala.collection.immutable.Set $ Set3
> Set (a, b, c, d)
> Class scala.collection.immutable.Set $ Set4
> // we want (a, b, c, d, e)
> Set (e, a, b, c, d)
> Class scala.collection.immutable.HashSet $ HashTrieSet
> {code}
> So we thought 2 approach to solve this problem:
> 1. Let `ProjectionTranslator # identifyFieldReferences` method guaranteed the
> order of the extracted field references same as input order.
> 2. We add the input and output field mapping.
> At last we using approach#2 solve the problem. This change is not necessary
> for the problem i have faced. But I feel it is better to let the output of
> this method in the same order as the input, it may be very helpful for other
> cases, though I am currently not aware of any. I am ok with not making this
> change, but we should add a comment instead to highlight that the potential
> output of the current output. Otherwise, some people may not pay attention to
> this and assume it is in order.
> Hi, guys, What do you think? Welcome any feedback.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)