Re: [PR] [KV] Implement Snapshot Query on KV Table [fluss]

via GitHub Thu, 18 Sep 2025 07:17:32 -0700


platinumhamburg commented on PR #1702:
URL: https://github.com/apache/fluss/pull/1702#issuecomment-3307716367


   > @platinumhamburg Correct.. I want to check the best way to approach the 
initial design. Based on the convo here, I will craft an initial design that:
   > 
   > 1. extends the current scanner API (instead of having these on the 
Lookuper)
   > 2. takes buckets into account for API
   > 3. keeps things simple by using the batch RPC (at least in the initial 
proposal version)
   > 
   > However, since the goal was for small tables:
   > 
   > 1. Do you think we should also consider multiple fetches? (this 
complicates thing a bit and im wondering if its indeed necessary)
   > 2. Based on the above, if yes - I guess streaming RPC might make more 
sense in this scenario as well.
   > 
   > Do you think it makes sense to differentiate between this two use cases 
and start with the first approach? and if there is demand and use case also 
work on the second for larger tables and streaming? Or do you think it will 
complicate the design/implementation so its better to consider both scenarios 
now?
   
   Based on my current thinking, I believe we might be better off not 
considering streaming semantics at the RPC layer, mainly for two reasons:
   
   1. Our existing Batch RPC framework is a custom batch-semantics-based 
framework that has been heavily optimized for zero-copy operations. Developing 
an additional streaming RPC framework at this point would not provide 
equivalent benefits relative to the costs involved.
   2. Streaming RPC only offers some convenience in usage, but it also comes 
with considerable performance limitations. For example, multiple Scanners 
cannot share connections (you can refer to the connection reuse mechanism 
currently implemented in the FetchLog system, where a single FetchLog request 
can simultaneously serve log fetching for multiple tables).
   
   Regarding whether to support both small table and large table scanning 
scenarios in a single PR, I think it's feasible to achieve the goal through 
multiple steps - "implementation" is always flexible in task planning. However, 
what's truly challenging is the design of public RPC interfaces. Public APIs 
need to have certain stability and evolvability. We need to avoid scenarios 
like:
   
   - We hastily expose an API that hasn't been thoroughly thought through, only 
to remove it several versions later.
   - We expose an API that lacks evolvability, forcing subsequent feature 
development to create semantically fragmented API families.
   
   Therefore, even if we plan to achieve our goals step by step at the API 
level, we should still have comprehensive design and planning for its future 
evolution and development from the very beginning.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [KV] Implement Snapshot Query on KV Table [fluss]

Reply via email to