ldadima commented on PR #27508: URL: https://github.com/apache/flink/pull/27508#issuecomment-3925471540
> Hey @ldadima, sorry for the delay. We had to align internally on the best strategy going forward. In general, we should avoid manipulating logical types during runtime. > > 1. Instead of always creating a new serializer, the simplest and most consistent approach would be to normalize all state keys to BinaryRowData using pre-initialized RowDataSerializer instances in MultiJoinStateViews.java. This is already the pattern used by joinKeysEqual() in InputSideHasUniqueKey and InputSideHasNoUniqueKey. Nothing hard, just doing it somewhere else. > 2. Another way would be to initialize the serializer you created here in the constructor but I think strategy 1, doing it directly where we interact with state, is a more adequate fix. > > Let me know what you think! I checked 1st solution and implemented. So I found new problem in this strategy (also only for HeapStateBackend): [Here](https://github.com/apache/flink/blob/ae0c1e6a05fa0126bcc030bcb9f8969aeaa3d5c1/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/typeutils/RowDataSerializer.java#L194), RowData is reused, which can cause this object to be overwritten in HeapStateMap, so it is necessary to copy RowData. Finally, we need yo create and fill GenericRowData, convert it to binary and copy binary row to new binary row. It's too many actions. So I suggest to reuse logic from RowDataSerializer#toBinaryRow() in AttributeBasedJoinKeyExtractor for build join keys. Thus, we will reduce the copying/recreating of RowData and get rid of RowDataSerializer. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
