Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3585
Hi @sunjincheng121, thanks for this PR.
To be honest, I don't completely understand the implementation of the
`RowsClauseBoundedOverProcessFunction`.
I thought about another design, that I would like to discuss (not
considering process time which should be addressed in a separate
ProcessFunction, IMO.):
- we have three state objects: 1) the accumulator row, 2) a MapState[Long,
List[Row]] for not processed data (`toProcess`), 3) a MapState[Long, List[Row]]
for processed data which needs to be retracted (`toRetract`).
- processElement() puts the element in the `toProcess` MapState with the
original timestamp and registers a timer for `currentWatermark() + 1`. Hence,
we only have a single timer which triggers when the next watermark is reached.
- onTimer() is called for the next watermark. We get an iterator over the
`toProcess` MapState. For RocksDB the iterator is sorted on the key. We
sort-insert the records from the iterator into a `LinkedList` (since the
iterator is sorted for RocksDB this will be simple append. For other state
backends it will be more expensive but we can tolerate that, IMO). We do the
same for `toRetract` MapState. So we have two sorted lists for data to
accumulate and to retract. We go over both sorted lists and accumulate and
retract for each step using the accumulator state. Then we emit the new row and
move the emitted row from the `toProcess` MapState to the `toRetract` MapState.
This design has the benefit of using RocksDB to sort. Moreover, we could
also put only those fields into the toRetract state that need to be retracted
instead of the full row.
What do you think about this approach?
Thanks, Fabian
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---