ldadima commented on PR #27508:
URL: https://github.com/apache/flink/pull/27508#issuecomment-3925471540

   > Hey @ldadima, sorry for the delay. We had to align internally on the best 
strategy going forward. In general, we should avoid manipulating logical types 
during runtime.
   > 
   > 1. Instead of always creating a new serializer, the simplest and most 
consistent approach would be to normalize all state keys to BinaryRowData using 
pre-initialized RowDataSerializer instances in MultiJoinStateViews.java. This 
is already the pattern used by joinKeysEqual() in InputSideHasUniqueKey and 
InputSideHasNoUniqueKey. Nothing hard, just doing it somewhere else.
   > 2. Another way would be to initialize the serializer you created here in 
the constructor but I think strategy 1, doing it directly where we interact 
with state, is a more adequate fix.
   > 
   > Let me know what you think!
   
   I checked 1st solution and implemented. So I found new problem in this 
strategy (also only for HeapStateBackend):  
[Here](https://github.com/apache/flink/blob/ae0c1e6a05fa0126bcc030bcb9f8969aeaa3d5c1/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/typeutils/RowDataSerializer.java#L194),
 RowData is reused, which can cause this object to be overwritten in 
HeapStateMap, so it is necessary to copy RowData. 
   Finally, we need yo create and fill GenericRowData, convert it to binary  
and copy binary row to new binary row. It's too many actions.
   So I suggest to reuse logic from RowDataSerializer#toBinaryRow() in 
AttributeBasedJoinKeyExtractor for build join keys. Thus, we will reduce the 
copying/recreating of RowData and get rid of RowDataSerializer.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to