Hi Mehul Batra, First of all, thank you very much for the detailed review and valuable suggestions. I really appreciate your insights.
*1. Per-Table System Table vs Global System Table* I think, the use case for the global view is to easily integrate with monitoring tools like grafana. Without a sql interface, users have to build a custom exporter using Admin API to monitor the tiering status of all tables. I do share your concerns regarding the performance impact when querying thousands of tables. While I acknowledge the potential performance risks in massive clusters, I believe it’s better to provide full visibility first. We can monitor real-world performance data and, if necessary, implement safeguards like implicit limits or forced LIMIT clauses as a follow-up optimization. *2. Error Message Truncation Strategy* It is a great point. Simply truncating the head of the error message might indeed cut off some important information. I agree with your suggestion "Smart extraction" that prioritizes the phrase near words like "*Caused by*". To keep the initial FIP-30 scope focused, I plan to implement basic truncation first. However, I would be very grateful if you could help with the smart extraction as a follow-up pr if you have the capacity. *3. Consolidating State Maps in LakeTableTieringManager* I also fully agree with consolidating the maps in LakeTableTieringManager. Looking at the code again, managing 7 separate maps (and soon 9) for each table is getting a bit complicated. It’s quite easy to miss one map when registering or removing tables, which could lead to bugs or small memory leaks over time. Grouping everything into a single TableTieringInfo object will make the logic much easier to follow and help keep all the metadata consistent. Plus, it should be a bit more memory-efficient by reducing the number of internal map nodes. I’ll definitely include this refactoring as part of the FIP-30 implementation. Thanks again for helping refine the design! Best Regards, SeungMin Lee 2026년 2월 14일 (토) AM 2:22, Mehul Batra <[email protected]>님이 작성: > Hi SeungMinLee, > > > > First of all, thank you for putting together FIP-30. The ability > > Tracking tiering status is a much-needed feature, and I appreciate the > thorough > design work that went into this proposal. > > > > After reviewing the FIP, I have a few thoughts and questions I'd like to > raise > for discussion. These are suggestions based on my understanding - I may > be > missing context, so please feel free to correct me if any of these points > have > already been considered. > > > > > > > > 1. Per-Table System Table vs Global System Table > > > > The proposal introduces both: > > - Global view: `fluss_catalog.sys.lake_tiering_status` > > - Per-table view: `my_db.my_table$tiering_status` > > > > I was wondering if we could simplify the initial implementation by > focusing on > the per-table `$tiering_status` virtual table for SQL access, while > relying on > The `listTieringStatuses()` Admin API for bulk/system-wide queries. > > > > My reasoning: > > - Consistency: The per-table pattern (`$tiering_status`) aligns with > Fluss's > existing virtual table conventions and is similar to the virtual table > approach with > `$changelog`, `$binlog`, etc. > > - Scalability: A global SQL table querying thousands of tables could have > > performance implications. The Admin API seems better suited for bulk > operations > with potential pagination support. > > > > > A phased approach (Phase 1: per-table SQL, Phase 2: Admin API) could ship > value to users faster with reduced initial scope. > > That said, I may be underestimating the need for the global SQL table. Are > there specific use cases that would be difficult to serve with just the > Admin API? > > > > > > 2. Error Message Truncation Strategy > > > > The proposal mentions truncating error messages to 2-4KB before sending > to the > Coordinator. I have a concern about simple head truncation potentially > removing > the most useful diagnostic information. > > > > > > > Are we considering an extraction strategy to deal with it, in my mind, > something like this? > > > - Smart extraction: Parse and extract all "Caused by:" lines, which > typically > contain the most actionable information > > > > I understand this adds complexity, so it's a trade-off. Curious to hear > others' > thoughts on whether this is worth addressing. > > > > > > 3. Consolidating State Maps in LakeTableTieringManager > > > > The proposal adds `tieringFailMessages` and `tieringFailTimes` maps to > > `LakeTableTieringManager`. Looking at the current implementation, the > manager > already maintains 6+ separate maps keyed by `tableId`: > > > > ```java > > Map<Long, TieringState> tieringStates; > > Map<Long, TablePath> tablePaths; > > Map<Long, Long> tableLakeFreshness; > > Map<Long, Long> tableTierEpoch; > > Map<Long, Long> tableLastTieredTime; > > Map<Long, Long> liveTieringTableIds; > > // Proposed additions: > > Map<Long, String> tieringFailMessages; > > Map<Long, Long> tieringFailTimes; > > > > One thought: would it be cleaner to consolidate these into a single > > TableTieringInfo object? > > > > Map<Long, TableTieringInfo> tableInfos; > > > > class TableTieringInfo { > > TablePath tablePath; > > long lakeFreshness; > > TieringState state; > > long tieringEpoch; > > long lastTieredTime; > > @Nullable String lastError; > > @Nullable Long lastErrorTime; > > } > > > > Potential benefits: > > - Single map lookup instead of multiple > > - Related state updated together naturally > > - Cleaner cleanup in removeLakeTable() (one removal vs. 8) > > > > > This could be a separate preparatory refactoring PR or part of FIP-30. > However, > I understand this might be out of scope for this FIP, and I don't want to > expand > the scope unnecessarily. Just raising it as a thought for the authors to > consider. > > > > These are just suggestions based on my reading of the proposal. I'm happy > to be > corrected if I've misunderstood anything. Also happy to help with > implementation or further discussion if useful. > > > > Thanks again for driving this important feature! > > > > Best regards, > > Mehul Batra > > On Thu, Feb 12, 2026 at 5:53 PM SeungMin Lee <[email protected]> wrote: > > > Hi dev, > > > > Just a quick update. > > > > I have migrated the design google docs to the cwiki and registered it as > > *FIP-30*. Please refer to the link below for the formal proposal: > > > > > > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-30%3A+Support+tracking+the+tiering+status+of+a+tiering+table > > > > The content remains consistent with the previous Google Doc. > > > > Best regards, > > SeungMin Lee > > > > 2026년 2월 12일 (목) PM 5:37, SeungMin Lee <[email protected]>님이 작성: > > > > > > Hi, dev > > > > > > Currently, there is no way for users to check the status of lake > tiering. > > Users cannot be aware if tiering fails, and they have to manually parse > the > > Tiering Service logs to identify the cause. > > > > > > So, I'd like to propose Issue-2362: Allow users to track the tiering > > status of a tiering table to address this visibility issue. > > > > > > I have drafted a design docs [2]. Please feel free to review and share > > your feed. > > > > > > Considering the upcoming holidays in some regions, I'll wait for > feedback > > and give a ping on this thread around Feb 23rd. > > > > > > Looking forward to your thoughts. > > > > > > Best regards, > > > SeungMin Lee > > > > > > [1] https://github.com/apache/fluss/issues/2362 > > > [2] > > > > > https://docs.google.com/document/d/1eJbRCwzAbeJLA97zQQ0I3JM1jerBXXhq69Dn8r4xWV0/edit?usp=sharing > > >
