[
https://issues.apache.org/jira/browse/FLINK-18687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dian Fu updated FLINK-18687:
----------------------------
Fix Version/s: (was: 1.12.0)
> ProjectionCodeGenerator#generateProjectionExpression should remove for loop
> optimization
> ----------------------------------------------------------------------------------------
>
> Key: FLINK-18687
> URL: https://issues.apache.org/jira/browse/FLINK-18687
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.11.0
> Reporter: Caizhi Weng
> Priority: Critical
>
> If too many fields of the same type are projected,
> {{ProjectionCodeGenerator#generateProjectionExpression}} currently performs a
> "for loop optimization" which, instead of generating code separately for each
> field, they'll be squashed into a for loop.
> However, if the indices of the fields with the same type are not continuous,
> this optimization will not write fields in index ascending order. This is not
> acceptable because {{BinaryWriter}}s expect the users to write fields in
> index ascending order (that is to say, we *have to* first write field 0, then
> field 1, then...), otherwise the variable length area of the two binary rows
> with same data might be different. Although we can use {{getXX}} methods of
> {{BinaryRow}} to get the fields correctly, states for streaming jobs compare
> state keys with binary bits, not with the contents of the keys. So we need to
> make sure the binary bits of the binary rows be the same if two rows contain
> the same data.
> What's worse, as the current implementation of
> {{ProjectionCodeGenerator#generateProjectionExpression}} uses a scala
> {{HashMap}}, the key order of the map might be different on different
> workers; Even if the projection does not meet the condition to be optimized,
> it will still be affected by this bug.
> What I suggest is to simply remove this optimization. Because if we still
> want this optimization, we have to make sure that the fields of the same type
> have continuous order, which is a very strict and rare condition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)