zhangwl9 commented on PR #4238: URL: https://github.com/apache/amoro/pull/4238#issuecomment-4627673199
> **Overall: Good design direction, but suggest refining the category taxonomy before merging.** > > The idea of separating table processes into different tabs is great. However, I think `MAINTENANCE` is too broad a category and we should consider a more precise taxonomy that better aligns with the nature of these operations. > > ### Suggested Classification > Instead of a binary `OPTIMIZING` / `MAINTENANCE` split, I'd suggest a three-way classification: > > Category Purpose Current Operations Future Extensions > **OPTIMIZING** Performance optimization (data reorganization) Minor / Major / Full Compaction Clustering, Sort Rewrite > **CLEANUP** Space reclamation & lifecycle management Expire Snapshots, Expire Data, Clean Orphan Files, Clean Dangling Delete Files VACUUM, Remove Old Metadata > **PROFILING** Information enrichment & metadata augmentation Auto Create Tags Collect Statistics, Build Index > Additionally, `Sync Hive Tables` is more of an internal implementation detail and probably should **not** be exposed to users in any tab. > > ### Rationale > * "Maintenance" is too vague — compaction could also be considered "maintenance" in a broad sense. > * The operations currently under `MAINTENANCE` actually serve two distinct purposes: space reclamation (Expire/Clean) vs. metadata enrichment (Auto Create Tags). As we add more operations (e.g., statistics collection), this distinction will become more important. > * This three-way split aligns with industry conventions: Delta Lake has `OPTIMIZE` / `VACUUM`, Iceberg docs separate "rewrite" from "expire/remove". > > ### Suggested Approach > For the **backend API**, I'd recommend defining three `processCategory` values upfront: `OPTIMIZING`, `CLEANUP`, `PROFILING`. This makes the API future-proof. > > For the **frontend**, there are two pragmatic options: > > 1. **Three tabs** (`Optimizing` / `Cleanup` / `Profiling`) — cleanest separation > 2. **Two tabs for now** (`Optimizing` / `Cleanup`) — merge Profiling into Cleanup since there's currently only one profiling operation (Auto Create Tags), and split it out later when more profiling operations are added > > Either way, the endpoint could be generalized from `/maintenance-types` to something like `/process-types?category=CLEANUP` for better extensibility. > > What do you think? the new pr fix it. Changes: - Add ProcessCategory enum with OPTIMIZING, CLEANUP, PROFILING - Replace getTableMaintenanceTypes() with getTableProcessTypes(category) - Remove excludeTypes parameter from TableProcessMapper.listProcessMeta() - Merge /optimizing-types and /maintenance-types into /process-types endpoint - Split Maintenance.vue into Cleanup.vue and Profiling.vue - Update HudiTableDescriptor and PaimonTableDescriptor - Fix TestIcebergServerTableDescriptor to pass processCategory parameter BREAKING CHANGE: Removed /optimizing-types and /maintenance-types REST endpoints. Use /process-types?processCategory=OPTIMIZING|CLEANUP|PROFILING. SPI method getTableMaintenanceTypes() replaced by getTableProcessTypes(). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
