Re: [DISCUSS] Allow user to track the tiering status of a tiering table

SeungMin Lee Sat, 14 Feb 2026 07:44:19 -0800

Hi Mehul Batra,

First of all, thank you very much for the detailed review and valuable
suggestions. I really appreciate your insights.


*1. Per-Table System Table vs Global System Table*
I think, the use case for the global view is to easily integrate with
monitoring tools like grafana. Without a sql interface, users have to build
a custom exporter using Admin API to monitor the tiering status of all
tables. I do share your concerns regarding the performance impact when
querying thousands of tables. While I acknowledge the potential performance
risks in massive clusters, I believe it’s better to provide full visibility
first. We can monitor real-world performance data and, if necessary,
implement safeguards like implicit limits or forced LIMIT clauses as a
follow-up optimization.


*2. Error Message Truncation Strategy*
It is a great point. Simply truncating the head of the error message might
indeed cut off some important information. I agree with your suggestion
"Smart extraction" that prioritizes the phrase near words like "*Caused by*".
To keep the initial FIP-30 scope focused, I plan to implement basic
truncation first. However, I would be very grateful if you could help with
the smart extraction as a follow-up pr if you have the capacity.


*3. Consolidating State Maps in LakeTableTieringManager*
I also fully agree with consolidating the maps in LakeTableTieringManager.
Looking at the code again, managing 7 separate maps (and soon 9) for each
table is getting a bit complicated. It’s quite easy to miss one map when
registering or removing tables, which could lead to bugs or small memory
leaks over time. Grouping everything into a single TableTieringInfo object
will make the logic much easier to follow and help keep all the metadata
consistent. Plus, it should be a bit more memory-efficient by reducing the
number of internal map nodes. I’ll definitely include this refactoring as
part of the FIP-30 implementation.


Thanks again for helping refine the design!

Best Regards,
SeungMin Lee


2026년 2월 14일 (토) AM 2:22, Mehul Batra <[email protected]>님이 작성:

>  Hi SeungMinLee,
>
>
>
>   First of all, thank you for putting together FIP-30. The ability
>
>   Tracking tiering status is a much-needed feature, and I appreciate the
> thorough
>   design work that went into this proposal.
>
>
>
>   After reviewing the FIP, I have a few thoughts and questions I'd like to
> raise
>   for discussion. These are suggestions based on my understanding - I may
> be
>   missing context, so please feel free to correct me if any of these points
> have
>   already been considered.
>
>
>
>
>
>
>
>   1. Per-Table System Table vs Global System Table
>
>
>
>   The proposal introduces both:
>
>   - Global view: `fluss_catalog.sys.lake_tiering_status`
>
>   - Per-table view: `my_db.my_table$tiering_status`
>
>
>
>   I was wondering if we could simplify the initial implementation by
> focusing on
>   the per-table `$tiering_status` virtual table for SQL access, while
> relying on
>   The `listTieringStatuses()` Admin API for bulk/system-wide queries.
>
>
>
>   My reasoning:
>
>   - Consistency: The per-table pattern (`$tiering_status`) aligns with
> Fluss's
>     existing virtual table conventions and is similar to the virtual table
> approach with
>     `$changelog`, `$binlog`, etc.
>
>   - Scalability: A global SQL table querying thousands of tables could have
>
>     performance implications. The Admin API seems better suited for bulk
> operations
>     with potential pagination support.
>
>
>
>
> A phased approach (Phase 1: per-table SQL, Phase 2: Admin API) could ship
> value to users faster with reduced initial scope.
>
> That said, I may be underestimating the need for the global SQL table. Are
> there specific use cases that would be difficult to serve with just the
> Admin API?
>
>
>
>
>
>  2. Error Message Truncation Strategy
>
>
>
>   The proposal mentions truncating error messages to 2-4KB before sending
> to the
>   Coordinator. I have a concern about simple head truncation potentially
> removing
>   the most useful diagnostic information.
>
>
>
>
>
>
>   Are we considering an extraction strategy to deal with it, in my mind,
> something like this?
>
>
>   - Smart extraction: Parse and extract all "Caused by:" lines, which
> typically
>     contain the most actionable information
>
>
>
>   I understand this adds complexity, so it's a trade-off. Curious to hear
> others'
>   thoughts on whether this is worth addressing.
>
>
>
>
>
>   3. Consolidating State Maps in LakeTableTieringManager
>
>
>
>   The proposal adds `tieringFailMessages` and `tieringFailTimes` maps to
>
>   `LakeTableTieringManager`. Looking at the current implementation, the
> manager
>   already maintains 6+ separate maps keyed by `tableId`:
>
>
>
>   ```java
>
>   Map<Long, TieringState> tieringStates;
>
>   Map<Long, TablePath> tablePaths;
>
>   Map<Long, Long> tableLakeFreshness;
>
>   Map<Long, Long> tableTierEpoch;
>
>   Map<Long, Long> tableLastTieredTime;
>
>   Map<Long, Long> liveTieringTableIds;
>
>   // Proposed additions:
>
>   Map<Long, String> tieringFailMessages;
>
>   Map<Long, Long> tieringFailTimes;
>
>
>
>   One thought: would it be cleaner to consolidate these into a single
>
>   TableTieringInfo object?
>
>
>
>   Map<Long, TableTieringInfo> tableInfos;
>
>
>
>   class TableTieringInfo {
>
>       TablePath tablePath;
>
>       long lakeFreshness;
>
>       TieringState state;
>
>       long tieringEpoch;
>
>       long lastTieredTime;
>
>       @Nullable String lastError;
>
>       @Nullable Long lastErrorTime;
>
>   }
>
>
>
>   Potential benefits:
>
>   - Single map lookup instead of multiple
>
>   - Related state updated together naturally
>
>   - Cleaner cleanup in removeLakeTable() (one removal vs. 8)
>
>
>
>
>   This could be a separate preparatory refactoring PR or part of FIP-30.
> However,
>   I understand this might be out of scope for this FIP, and I don't want to
> expand
>   the scope unnecessarily. Just raising it as a thought for the authors to
> consider.
>
>
>
>   These are just suggestions based on my reading of the proposal. I'm happy
> to be
>   corrected if I've misunderstood anything. Also happy to help with
> implementation or further discussion if useful.
>
>
>
>   Thanks again for driving this important feature!
>
>
>
>   Best regards,
>
>   Mehul Batra
>
> On Thu, Feb 12, 2026 at 5:53 PM SeungMin Lee <[email protected]> wrote:
>
> > Hi dev,
> >
> > Just a quick update.
> >
> > I have migrated the design google docs to the cwiki and registered it as
> > *FIP-30*. Please refer to the link below for the formal proposal:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-30%3A+Support+tracking+the+tiering+status+of+a+tiering+table
> >
> > The content remains consistent with the previous Google Doc.
> >
> > Best regards,
> > SeungMin Lee
> >
> > 2026년 2월 12일 (목) PM 5:37, SeungMin Lee <[email protected]>님이 작성:
> > >
> > > Hi, dev
> > >
> > > Currently, there is no way for users to check the status of lake
> tiering.
> > Users cannot be aware if tiering fails, and they have to manually parse
> the
> > Tiering Service logs to identify the cause.
> > >
> > > So, I'd like to propose Issue-2362: Allow users to track the tiering
> > status of a tiering table to address this visibility issue.
> > >
> > > I have drafted a design docs [2]. Please feel free to review and share
> > your feed.
> > >
> > > Considering the upcoming holidays in some regions, I'll wait for
> feedback
> > and give a ping on this thread around Feb 23rd.
> > >
> > > Looking forward to your thoughts.
> > >
> > > Best regards,
> > > SeungMin Lee
> > >
> > > [1] https://github.com/apache/fluss/issues/2362
> > > [2]
> >
> >
> https://docs.google.com/document/d/1eJbRCwzAbeJLA97zQQ0I3JM1jerBXXhq69Dn8r4xWV0/edit?usp=sharing
> >
>

Re: [DISCUSS] Allow user to track the tiering status of a tiering table

Reply via email to