geserdugarov commented on code in PR #12796:
URL: https://github.com/apache/hudi/pull/12796#discussion_r1957589955
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/BucketAssignFunction.java:
##########
@@ -81,8 +79,12 @@ public class BucketAssignFunction<K, I, O extends
HoodieRecord<?>>
* <li>If it does, tag the record with the location</li>
* <li>If it does not, use the {@link BucketAssigner} to generate a new
bucket ID</li>
* </ul>
+ *
+ * ValueState here is structured as Tuple2(partition, fileId).
+ * We use Flink Tuple2 because this state will be serialized/deserialized.
+ * Otherwise, for any chosen data structure we should implement custom
serializer.
*/
- private ValueState<HoodieRecordGlobalLocation> indexState;
+ private ValueState<Tuple2<StringData, StringData>> indexState;
Review Comment:
My bad, this place is actually doesn't require optimization, because at this
place serde will be executed only during checkpoints, which is negligible in
comparison of serde cost of each stream record.
I've rollbacked previous implementation of `indexState`, which contains
`HoodieRecordGlobalLocation`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]