Re: [DISCUSS] Missing TreeMapState abstraction for maintaining ordered state

Zakelly Lan Mon, 21 Apr 2025 03:26:08 -0700

Hi Charles,

Thanks for the summary. Yeah it is what has been discussed so far.


Note 1: From what I understand (I haven’t looked into Flink’s internals
> yet), the proposed encoding should be compatible with ForSt as well.
>

ForSt shares almost the same encoding as RocksDB (in serialized bytes). If
it works for RocksDB, it should work for ForSt.

 Note 2: For the order book use case, sorting doubles (IEEE 754) is fairly
> straightforward:


Well, there seems to be some bug in your `toOrderedBits`, but there should
be a good mapping from logical order to binary ordered bytes. In all, the
bytes order could serve common primitives or tuple order needs, and align
with the binary order of keys in LSM structure with low cost.


Best,
Zakelly


On Sat, Apr 19, 2025 at 5:49 AM Charles COLELLA <charles...@hotmail.fr>
wrote:

> Hi Zakelly,
>
> Thanks for sharing all these resources, I took some time to go through
> them and wanted to share a quick summary and a few thoughts.
>
> Here’s a brief recap of what was discussed on the mailing list (more
> details in [1]):
>
> They proposed introducing a BinarySortedMultiMapState<UK, UV> with two
> user-facing APIs:
> - BinarySortedMultiMapState<UK, UV> for storing multiple values per key
> (add(K key, V value) to append a value, etc.)
> - BinarySortedMapState<UK, UV> as a simpler option when there’s only one
> value per key (a special case of the first one)
>
> The readRange() method would use an eager-batch mode by default (fetching
> entries in chunks of 128), which strikes a good balance between lazy access
> (one by one) and fully eager reads.
>
> Main challenge: ensuring that the binary representation preserves the same
> ordering as the Java Comparable.
> So keys would need a custom serializer::
> interface LexicographicalTypeSerializer<T> extends TypeSerializer<T> {
>   Optional<Comparator<T>> findComparator(); // useful for HeapStateBackend
> }
> This allows users to define custom key types while preserving sort order.
>
> For the HeapStateBackend, a skip list was suggested instead of a TreeMap,
> as it offers better memory locality and fewer allocations, while still
> maintaining O(log n) performance.
>
> Note 1: From what I understand (I haven’t looked into Flink’s internals
> yet), the proposed encoding should be compatible with ForSt as well.
>
> Note 2: For the order book use case, sorting doubles (IEEE 754) is fairly
> straightforward:
>
> long toOrderedBits(double value) {
> long bits = Double.doubleToRawLongBits(value);
> return bits ^ (((bits >> 63) - 1L) | 0x8000000000000000L);
> }
>
>
> [1] https://cwiki.apache.org/confluence/x/Xo_FD
>
> Thanks gain,
>
> Charles
>
> From: Zakelly Lan <zakelly....@gmail.com>
> Date: Monday, April 14, 2025 at 5:55 AM
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: Re: [DISCUSS] Missing TreeMapState abstraction for maintaining
> ordered state
>
> Hi Charles,
>
> There is a FLIP-220[1] and discussions[2] about introducing a sorted map
> state in binary order. But it seems like there is no further progress and
> conclusion. It would be nice if anyone could drive this.
>
> I would +1 for introducing this. As the RocksDB stores key-value in binary
> order, we should also keep that order. And the API design requires more
> discussion.
>
>
> [1] https://cwiki.apache.org/confluence/x/Xo_FD
> [2] https://lists.apache.org/thread/lbdpr1cjt9ddrotowmzzk4lv787b3t5z
>
> Best,
> Zakelly
>
> On Sat, Apr 12, 2025 at 6:35 PM Charles COLELLA <charles...@hotmail.fr>
> wrote:
>
> > Hi all,
> >
> > I'm building a Flink application that processes real-time order book data
> > (~70k events/sec), partitioned by instrument.
> >
> >
> > For each key, I need to maintain an order book where levels are kept
> > ordered by price. Since Flink's state API doesn't support ordered
> > structures like a NavigableMap or TreeMap, I currently have to:
> >
> > - Store all levels in a MapState<Double, OrderLevel> for persistence,
> >
> > - Maintain a parallel TreeMap<Double, OrderLevel> in memory to preserve
> > ordering,
> >
> > - Keep both in sync on every update,
> >
> > - And rebuild the memory structure from MapState during recovery.
> >
> >
> > This adds significant overhead:
> >
> > - Extra memory (duplicate data structures),
> >
> > - Extra logic (synchronization and consistency),
> >
> > - Extra CPU cost (sorting, filtering, top-K extraction).
> >
> > The use case (maintaining sorted state per key) is common in financial
> > applications, ranking systems, and any stream logic requiring efficient
> > ordered access.
> >
> >
> > I’ve tried to search the Jira, dev mailing list, and flips, but couldn’t
> > find a discussion on 'TreeMapState', 'SortedMapState`, or abstraction
> that
> > would support this.
> >
> >
> > Has this been discussed before, or would it make sense to open a
> proposal?
> >
> >
> > Thanks,
> >
> > Charles
> >
>

Re: [DISCUSS] Missing TreeMapState abstraction for maintaining ordered state

Reply via email to