Re: [DISCUSS] Allow user to track the tiering status of a tiering table

SeungMin Lee Mon, 23 Feb 2026 07:30:04 -0800

Hi dev,

Hope you had a refreshing break.


Touching base on FIP-30. I'm aiming to wrap up the feedback process by the
week the 0.9 release vote
<https://lists.apache.org/thread/3c8w6ofrssjxrpvz85pkm2n2kx1gyzxd> ends, so
we can stay aligned with the project timeline. Also, hope the 0.9 release
vote <https://lists.apache.org/thread/3c8w6ofrssjxrpvz85pkm2n2kx1gyzxd>
gets plenty of interest as well.

Looking forward to your thoughts.

Best regards,
SeungMin Lee

2026년 2월 15일 (일) AM 12:43, SeungMin Lee <[email protected]>님이 작성:

> Hi Mehul Batra,
>
> First of all, thank you very much for the detailed review and valuable
> suggestions. I really appreciate your insights.
>
> *1. Per-Table System Table vs Global System Table*
> I think, the use case for the global view is to easily integrate with
> monitoring tools like grafana. Without a sql interface, users have to build
> a custom exporter using Admin API to monitor the tiering status of all
> tables. I do share your concerns regarding the performance impact when
> querying thousands of tables. While I acknowledge the potential performance
> risks in massive clusters, I believe it’s better to provide full visibility
> first. We can monitor real-world performance data and, if necessary,
> implement safeguards like implicit limits or forced LIMIT clauses as a
> follow-up optimization.
>
>
> *2. Error Message Truncation Strategy*
> It is a great point. Simply truncating the head of the error message might
> indeed cut off some important information. I agree with your suggestion
> "Smart extraction" that prioritizes the phrase near words like "*Caused
> by*". To keep the initial FIP-30 scope focused, I plan to implement basic
> truncation first. However, I would be very grateful if you could help with
> the smart extraction as a follow-up pr if you have the capacity.
>
>
> *3. Consolidating State Maps in LakeTableTieringManager*
> I also fully agree with consolidating the maps in LakeTableTieringManager.
> Looking at the code again, managing 7 separate maps (and soon 9) for each
> table is getting a bit complicated. It’s quite easy to miss one map when
> registering or removing tables, which could lead to bugs or small memory
> leaks over time. Grouping everything into a single TableTieringInfo object
> will make the logic much easier to follow and help keep all the metadata
> consistent. Plus, it should be a bit more memory-efficient by reducing the
> number of internal map nodes. I’ll definitely include this refactoring as
> part of the FIP-30 implementation.
>
>
> Thanks again for helping refine the design!
>
> Best Regards,
> SeungMin Lee
>
>
> 2026년 2월 14일 (토) AM 2:22, Mehul Batra <[email protected]>님이 작성:
>
>>  Hi SeungMinLee,
>>
>>
>>
>>   First of all, thank you for putting together FIP-30. The ability
>>
>>   Tracking tiering status is a much-needed feature, and I appreciate the
>> thorough
>>   design work that went into this proposal.
>>
>>
>>
>>   After reviewing the FIP, I have a few thoughts and questions I'd like to
>> raise
>>   for discussion. These are suggestions based on my understanding - I may
>> be
>>   missing context, so please feel free to correct me if any of these
>> points
>> have
>>   already been considered.
>>
>>
>>
>>
>>
>>
>>
>>   1. Per-Table System Table vs Global System Table
>>
>>
>>
>>   The proposal introduces both:
>>
>>   - Global view: `fluss_catalog.sys.lake_tiering_status`
>>
>>   - Per-table view: `my_db.my_table$tiering_status`
>>
>>
>>
>>   I was wondering if we could simplify the initial implementation by
>> focusing on
>>   the per-table `$tiering_status` virtual table for SQL access, while
>> relying on
>>   The `listTieringStatuses()` Admin API for bulk/system-wide queries.
>>
>>
>>
>>   My reasoning:
>>
>>   - Consistency: The per-table pattern (`$tiering_status`) aligns with
>> Fluss's
>>     existing virtual table conventions and is similar to the virtual table
>> approach with
>>     `$changelog`, `$binlog`, etc.
>>
>>   - Scalability: A global SQL table querying thousands of tables could
>> have
>>
>>     performance implications. The Admin API seems better suited for bulk
>> operations
>>     with potential pagination support.
>>
>>
>>
>>
>> A phased approach (Phase 1: per-table SQL, Phase 2: Admin API) could ship
>> value to users faster with reduced initial scope.
>>
>> That said, I may be underestimating the need for the global SQL table. Are
>> there specific use cases that would be difficult to serve with just the
>> Admin API?
>>
>>
>>
>>
>>
>>  2. Error Message Truncation Strategy
>>
>>
>>
>>   The proposal mentions truncating error messages to 2-4KB before sending
>> to the
>>   Coordinator. I have a concern about simple head truncation potentially
>> removing
>>   the most useful diagnostic information.
>>
>>
>>
>>
>>
>>
>>   Are we considering an extraction strategy to deal with it, in my mind,
>> something like this?
>>
>>
>>   - Smart extraction: Parse and extract all "Caused by:" lines, which
>> typically
>>     contain the most actionable information
>>
>>
>>
>>   I understand this adds complexity, so it's a trade-off. Curious to hear
>> others'
>>   thoughts on whether this is worth addressing.
>>
>>
>>
>>
>>
>>   3. Consolidating State Maps in LakeTableTieringManager
>>
>>
>>
>>   The proposal adds `tieringFailMessages` and `tieringFailTimes` maps to
>>
>>   `LakeTableTieringManager`. Looking at the current implementation, the
>> manager
>>   already maintains 6+ separate maps keyed by `tableId`:
>>
>>
>>
>>   ```java
>>
>>   Map<Long, TieringState> tieringStates;
>>
>>   Map<Long, TablePath> tablePaths;
>>
>>   Map<Long, Long> tableLakeFreshness;
>>
>>   Map<Long, Long> tableTierEpoch;
>>
>>   Map<Long, Long> tableLastTieredTime;
>>
>>   Map<Long, Long> liveTieringTableIds;
>>
>>   // Proposed additions:
>>
>>   Map<Long, String> tieringFailMessages;
>>
>>   Map<Long, Long> tieringFailTimes;
>>
>>
>>
>>   One thought: would it be cleaner to consolidate these into a single
>>
>>   TableTieringInfo object?
>>
>>
>>
>>   Map<Long, TableTieringInfo> tableInfos;
>>
>>
>>
>>   class TableTieringInfo {
>>
>>       TablePath tablePath;
>>
>>       long lakeFreshness;
>>
>>       TieringState state;
>>
>>       long tieringEpoch;
>>
>>       long lastTieredTime;
>>
>>       @Nullable String lastError;
>>
>>       @Nullable Long lastErrorTime;
>>
>>   }
>>
>>
>>
>>   Potential benefits:
>>
>>   - Single map lookup instead of multiple
>>
>>   - Related state updated together naturally
>>
>>   - Cleaner cleanup in removeLakeTable() (one removal vs. 8)
>>
>>
>>
>>
>>   This could be a separate preparatory refactoring PR or part of FIP-30.
>> However,
>>   I understand this might be out of scope for this FIP, and I don't want
>> to
>> expand
>>   the scope unnecessarily. Just raising it as a thought for the authors to
>> consider.
>>
>>
>>
>>   These are just suggestions based on my reading of the proposal. I'm
>> happy
>> to be
>>   corrected if I've misunderstood anything. Also happy to help with
>> implementation or further discussion if useful.
>>
>>
>>
>>   Thanks again for driving this important feature!
>>
>>
>>
>>   Best regards,
>>
>>   Mehul Batra
>>
>> On Thu, Feb 12, 2026 at 5:53 PM SeungMin Lee <[email protected]> wrote:
>>
>> > Hi dev,
>> >
>> > Just a quick update.
>> >
>> > I have migrated the design google docs to the cwiki and registered it as
>> > *FIP-30*. Please refer to the link below for the formal proposal:
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLUSS/FIP-30%3A+Support+tracking+the+tiering+status+of+a+tiering+table
>> >
>> > The content remains consistent with the previous Google Doc.
>> >
>> > Best regards,
>> > SeungMin Lee
>> >
>> > 2026년 2월 12일 (목) PM 5:37, SeungMin Lee <[email protected]>님이 작성:
>> > >
>> > > Hi, dev
>> > >
>> > > Currently, there is no way for users to check the status of lake
>> tiering.
>> > Users cannot be aware if tiering fails, and they have to manually parse
>> the
>> > Tiering Service logs to identify the cause.
>> > >
>> > > So, I'd like to propose Issue-2362: Allow users to track the tiering
>> > status of a tiering table to address this visibility issue.
>> > >
>> > > I have drafted a design docs [2]. Please feel free to review and share
>> > your feed.
>> > >
>> > > Considering the upcoming holidays in some regions, I'll wait for
>> feedback
>> > and give a ping on this thread around Feb 23rd.
>> > >
>> > > Looking forward to your thoughts.
>> > >
>> > > Best regards,
>> > > SeungMin Lee
>> > >
>> > > [1] https://github.com/apache/fluss/issues/2362
>> > > [2]
>> >
>> >
>> https://docs.google.com/document/d/1eJbRCwzAbeJLA97zQQ0I3JM1jerBXXhq69Dn8r4xWV0/edit?usp=sharing
>> >
>>
>

Re: [DISCUSS] Allow user to track the tiering status of a tiering table

Reply via email to