zhoujinsong commented on PR #4238:
URL: https://github.com/apache/amoro/pull/4238#issuecomment-4608833115

   
   **Overall: Good design direction, but suggest refining the category taxonomy 
before merging.**
   
   The idea of separating table processes into different tabs is great. 
However, I think `MAINTENANCE` is too broad a category and we should consider a 
more precise taxonomy that better aligns with the nature of these operations.
   
   ### Suggested Classification
   
   Instead of a binary `OPTIMIZING` / `MAINTENANCE` split, I'd suggest a 
three-way classification:
   
   | Category | Purpose | Current Operations | Future Extensions |
   |----------|---------|-------------------|-------------------|
   | **OPTIMIZING** | Performance optimization (data reorganization) | Minor / 
Major / Full Compaction | Clustering, Sort Rewrite |
   | **CLEANUP** | Space reclamation & lifecycle management | Expire Snapshots, 
Expire Data, Clean Orphan Files, Clean Dangling Delete Files | VACUUM, Remove 
Old Metadata |
   | **PROFILING** | Information enrichment & metadata augmentation | Auto 
Create Tags | Collect Statistics, Build Index |
   
   Additionally, `Sync Hive Tables` is more of an internal implementation 
detail and probably should **not** be exposed to users in any tab.
   
   ### Rationale
   
   - "Maintenance" is too vague — compaction could also be considered 
"maintenance" in a broad sense.
   - The operations currently under `MAINTENANCE` actually serve two distinct 
purposes: space reclamation (Expire/Clean) vs. metadata enrichment (Auto Create 
Tags). As we add more operations (e.g., statistics collection), this 
distinction will become more important.
   - This three-way split aligns with industry conventions: Delta Lake has 
`OPTIMIZE` / `VACUUM`, Iceberg docs separate "rewrite" from "expire/remove".
   
   ### Suggested Approach
   
   For the **backend API**, I'd recommend defining three `processCategory` 
values upfront: `OPTIMIZING`, `CLEANUP`, `PROFILING`. This makes the API 
future-proof.
   
   For the **frontend**, there are two pragmatic options:
   1. **Three tabs** (`Optimizing` / `Cleanup` / `Profiling`) — cleanest 
separation
   2. **Two tabs for now** (`Optimizing` / `Cleanup`) — merge Profiling into 
Cleanup since there's currently only one profiling operation (Auto Create 
Tags), and split it out later when more profiling operations are added
   
   Either way, the endpoint could be generalized from `/maintenance-types` to 
something like `/process-types?category=CLEANUP` for better extensibility.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to