platinumhamburg commented on PR #1702: URL: https://github.com/apache/fluss/pull/1702#issuecomment-3307716367
> @platinumhamburg Correct.. I want to check the best way to approach the initial design. Based on the convo here, I will craft an initial design that: > > 1. extends the current scanner API (instead of having these on the Lookuper) > 2. takes buckets into account for API > 3. keeps things simple by using the batch RPC (at least in the initial proposal version) > > However, since the goal was for small tables: > > 1. Do you think we should also consider multiple fetches? (this complicates thing a bit and im wondering if its indeed necessary) > 2. Based on the above, if yes - I guess streaming RPC might make more sense in this scenario as well. > > Do you think it makes sense to differentiate between this two use cases and start with the first approach? and if there is demand and use case also work on the second for larger tables and streaming? Or do you think it will complicate the design/implementation so its better to consider both scenarios now? Based on my current thinking, I believe we might be better off not considering streaming semantics at the RPC layer, mainly for two reasons: 1. Our existing Batch RPC framework is a custom batch-semantics-based framework that has been heavily optimized for zero-copy operations. Developing an additional streaming RPC framework at this point would not provide equivalent benefits relative to the costs involved. 2. Streaming RPC only offers some convenience in usage, but it also comes with considerable performance limitations. For example, multiple Scanners cannot share connections (you can refer to the connection reuse mechanism currently implemented in the FetchLog system, where a single FetchLog request can simultaneously serve log fetching for multiple tables). Regarding whether to support both small table and large table scanning scenarios in a single PR, I think it's feasible to achieve the goal through multiple steps - "implementation" is always flexible in task planning. However, what's truly challenging is the design of public RPC interfaces. Public APIs need to have certain stability and evolvability. We need to avoid scenarios like: - We hastily expose an API that hasn't been thoroughly thought through, only to remove it several versions later. - We expose an API that lacks evolvability, forcing subsequent feature development to create semantically fragmented API families. Therefore, even if we plan to achieve our goals step by step at the API level, we should still have comprehensive design and planning for its future evolution and development from the very beginning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
