>
> Hi, Jark.
> Thanks for your comprehensive and valuable feedback. This has made the FIP
> proposal more complete. Let me respond to the comments one by one:
>
> - Comments (1) and (2) essentially target the same issue: for buckets
> without checkpoints, how do we determine which updates need to be rolled
> back? Persisting state to Fluss is an excellent solution. It allows us to
> distinguish whether an instance is starting for the first time or
> recovering from a failover without checkpoints. This is a very important
> suggestion. I will add a handling plan for this corner case to the FIP.
> - I strongly agree with suggestions (3), (4), (5), and (6). I will
> update the FIP accordingly.
> - Regarding suggestion (7), I have some concerns. In fact, our API
> design and implementation architecture influence each other. The core issue
> is that for the same bucket of the same table, we cannot allow the send
> queue to contain mixed WriteBatches based on different agg_modes, as this
> would lead to non-deterministic write results. To introduce agg_mode
> cleanly, we would need significant refactoring of both the upper-layer
> write API and the batching/aggregation sending architecture. This
> complexity may be unnecessary at the moment. Designing a “Recovery-mode”
> connection is actually a compromise to introduce the minimal complexity
> while still providing correct semantics. Perhaps we can discuss this
> further.
> - Regarding suggestion (8), considering that users may work with very
> wide tables, they might be forced to add extra configuration items for many
> columns that have no special aggregation needs, which would hurt user
> experience and bloat configuration. Apache Paimon defaults to using the
> last_non_null_value aggregation function for unspecified columns, and I
> believe most users may already be accustomed to this behavior. It might be
> better for us to stay consistent with Paimon.
>
>