polyzos commented on PR #1702: URL: https://github.com/apache/fluss/pull/1702#issuecomment-3305439761
@platinumhamburg Thank you so much for all the detailed review and all this great feedback you have provided, which is more than great. Some context here: the goal was to use this feature for realtime dashboard use cases - realtime user tracking, fleet management etc. where the key space is typically smaller due to visualization needs - i.e from really a few rows up to a few thousands with something ~1-5k as a threshold. At first i was thinking to make the threshold configurable to throw an exception and not allow the user to use this feature (to avoid OoO for large tables). Eventually I thought of adding just a note in the docs, but seems like its better to add this. Similarly for the API design, I was really reluctant in terms of the naming and where the api should live. I thought the scanner at first, but on my mind its more about unbounded / continuous data. The lookuper is about lookups on the kv table, and i thought of treating it as a "whole database" lookups. Moreover because the intention was to return a full snapshot of the table as is - i.e its an aggregated table with the required columns, column pruning and predicate pushdown where not really taken into account. And same for the partition pruning because on the latest partition should have the latest desired values. Seems like a made quite a few assumption in terms of correct usage! Let me know your thoughts on the above and I will close this PR and proceed with creating a design proposal first. 🙏 Again thank you so much for all this great feedback 🙇♂️ cc @wuchong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
