snleee commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1041195396
##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
/**
* A {@link PinotTaskGenerator} implementation for generating tasks of type
{@link MergeRollupTask}
*
- * TODO: Add the support for realtime table
+ * Assumptions:
+ * - When the MergeRollupTask starts the first time, records older than the
min(now ms, max end time ms of all ready to
+ * process segments) - bufferTimeMs have already been ingested. If not,
newly ingested records older than that time
+ * may not be properly merged (Due to the latest watermarks advanced too
far before records are ingested).
+ * - If it is needed, there are backfill protocols to ingest and replace
records older than the latest watermarks.
+ * Those protocols can handle time alignment (according to merge levels
configurations) correctly.
+ * - If it is needed, there are reconcile protocols to merge & rollup newly
ingested segments that are (1) older than
+ * the latest watermarks, and (2) not time aligned according to merge
levels configurations
+ * - For realtime tables, those protocols are needed if streaming records
arrive late (older thant the latest
+ * watermarks)
+ * - For offline tables, those protocols are needed if there are
non-time-aligned segments ingested accidentally.
*
- * Steps:
*
+ * Steps:
* - Pre-select segments:
* - Fetch all segments, select segments based on segment lineage (removing
segmentsFrom for COMPLETED lineage
* entry and segmentsTo for IN_PROGRESS lineage entry)
+ * - For realtime tables, remove
+ * - in-progress segments, and
Review Comment:
I saw that you already did! thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]