zhoujinsong commented on PR #4238: URL: https://github.com/apache/amoro/pull/4238#issuecomment-4608833115
**Overall: Good design direction, but suggest refining the category taxonomy before merging.** The idea of separating table processes into different tabs is great. However, I think `MAINTENANCE` is too broad a category and we should consider a more precise taxonomy that better aligns with the nature of these operations. ### Suggested Classification Instead of a binary `OPTIMIZING` / `MAINTENANCE` split, I'd suggest a three-way classification: | Category | Purpose | Current Operations | Future Extensions | |----------|---------|-------------------|-------------------| | **OPTIMIZING** | Performance optimization (data reorganization) | Minor / Major / Full Compaction | Clustering, Sort Rewrite | | **CLEANUP** | Space reclamation & lifecycle management | Expire Snapshots, Expire Data, Clean Orphan Files, Clean Dangling Delete Files | VACUUM, Remove Old Metadata | | **PROFILING** | Information enrichment & metadata augmentation | Auto Create Tags | Collect Statistics, Build Index | Additionally, `Sync Hive Tables` is more of an internal implementation detail and probably should **not** be exposed to users in any tab. ### Rationale - "Maintenance" is too vague — compaction could also be considered "maintenance" in a broad sense. - The operations currently under `MAINTENANCE` actually serve two distinct purposes: space reclamation (Expire/Clean) vs. metadata enrichment (Auto Create Tags). As we add more operations (e.g., statistics collection), this distinction will become more important. - This three-way split aligns with industry conventions: Delta Lake has `OPTIMIZE` / `VACUUM`, Iceberg docs separate "rewrite" from "expire/remove". ### Suggested Approach For the **backend API**, I'd recommend defining three `processCategory` values upfront: `OPTIMIZING`, `CLEANUP`, `PROFILING`. This makes the API future-proof. For the **frontend**, there are two pragmatic options: 1. **Three tabs** (`Optimizing` / `Cleanup` / `Profiling`) — cleanest separation 2. **Two tabs for now** (`Optimizing` / `Cleanup`) — merge Profiling into Cleanup since there's currently only one profiling operation (Auto Create Tags), and split it out later when more profiling operations are added Either way, the endpoint could be generalized from `/maintenance-types` to something like `/process-types?category=CLEANUP` for better extensibility. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
