Re: [PR] Add OrderedListState support for SparkRunner [beam]

via GitHub Thu, 19 Dec 2024 06:12:50 -0800


twosom commented on code in PR #33212:
URL: https://github.com/apache/beam/pull/33212#discussion_r1892239162



##########
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkStateInternals.java:
##########
@@ -622,4 +627,82 @@ public Boolean read() {
       };
     }
   }
+
+  private final class SparkOrderedListState<T> extends 
AbstractState<List<TimestampedValue<T>>>
+      implements OrderedListState<T> {
+
+    private SparkOrderedListState(StateNamespace namespace, String id, 
Coder<T> coder) {
+      super(namespace, id, 
ListCoder.of(TimestampedValue.TimestampedValueCoder.of(coder)));
+    }
+
+    private SortedMap<Instant, TimestampedValue<T>> readAsMap() {
+      final List<TimestampedValue<T>> listValues =

Review Comment:
   I agree with your point. Let me share my thoughts on why I chose this 
implementation.
   
   I've noticed that ListState/OrderedListState is mostly used in situations 
where writes happen much more frequently than reads. That's why I went with 
ArrayList instead of SortedMap - it's simply better at handling these frequent 
writes.
   
   When it comes to reading data, it usually happens in just a couple of 
scenarios - either during OnTimer execution or when the list hits a certain 
size. So even if the read performance takes a small hit, it's not really going 
to affect the overall performance much.
   
   It's also worth mentioning that FlinkOrderedListState uses the same 
approach, which gives me confidence in this design choice.
   
   That's why I think the current implementation makes more sense for 
real-world usage patterns.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add OrderedListState support for SparkRunner [beam]

Reply via email to