twosom commented on code in PR #33212:
URL: https://github.com/apache/beam/pull/33212#discussion_r1897383414
##########
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkStateInternals.java:
##########
@@ -622,4 +627,82 @@ public Boolean read() {
};
}
}
+
+ private final class SparkOrderedListState<T> extends
AbstractState<List<TimestampedValue<T>>>
+ implements OrderedListState<T> {
+
+ private SparkOrderedListState(StateNamespace namespace, String id,
Coder<T> coder) {
+ super(namespace, id,
ListCoder.of(TimestampedValue.TimestampedValueCoder.of(coder)));
+ }
+
+ private SortedMap<Instant, TimestampedValue<T>> readAsMap() {
+ final List<TimestampedValue<T>> listValues =
Review Comment:
@kennknowles
Hello. I've been reviewing the Spark documentation over the weekend.
Unfortunately, as far as I can tell, Spark doesn't have the native state API
that we're looking for. Moreover, since the current SparkStateInternals is a
class implemented independently of the Spark API, it's also difficult to apply
RDD-based APIs.
The performance requirements you mentioned seem to be well satisfied by
`WindmillOrderedList`. Given that the current implementation cannot guarantee
the same performance, I completely understand if the PR doesn't get merged.
However, regardless of whether this PR gets merged or not, I will continue
researching to meet the requirements you've outlined. I plan to explore better
implementation approaches, taking inspiration from concepts like pendingAdds
and pendingDeletes as used in `WindmillOrderedList`.
Belated Merry Christmas.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]