twosom commented on code in PR #33212:
URL: https://github.com/apache/beam/pull/33212#discussion_r1892239162
##########
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkStateInternals.java:
##########
@@ -622,4 +627,82 @@ public Boolean read() {
};
}
}
+
+ private final class SparkOrderedListState<T> extends
AbstractState<List<TimestampedValue<T>>>
+ implements OrderedListState<T> {
+
+ private SparkOrderedListState(StateNamespace namespace, String id,
Coder<T> coder) {
+ super(namespace, id,
ListCoder.of(TimestampedValue.TimestampedValueCoder.of(coder)));
+ }
+
+ private SortedMap<Instant, TimestampedValue<T>> readAsMap() {
+ final List<TimestampedValue<T>> listValues =
Review Comment:
I agree with your point. Let me share my thoughts on why I chose this
implementation.
I've noticed that ListState/OrderedListState is mostly used in situations
where writes happen much more frequently than reads. That's why I went with
ArrayList instead of SortedMap - it's simply better at handling these frequent
writes.
When it comes to reading data, it usually happens in just a couple of
scenarios - either during OnTimer execution or when the list hits a certain
size. So even if the read performance takes a small hit, it's not really going
to affect the overall performance much.
It's also worth mentioning that FlinkOrderedListState uses the same
approach, which gives me confidence in this design choice.
That's why I think the current implementation makes more sense for
real-world usage patterns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]