maytasm commented on a change in pull request #10371:
URL: https://github.com/apache/druid/pull/10371#discussion_r489245349
##########
File path:
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactionSegmentIterator.java
##########
@@ -19,22 +19,45 @@
package org.apache.druid.server.coordinator.duty;
-import it.unimi.dsi.fastutil.objects.Object2LongOpenHashMap;
+import org.apache.druid.server.coordinator.CompactionStatistics;
import org.apache.druid.timeline.DataSegment;
import java.util.Iterator;
import java.util.List;
+import java.util.Map;
/**
* Segments in the lists which are the elements of this iterator are sorted
according to the natural segment order
* (see {@link DataSegment#compareTo}).
*/
public interface CompactionSegmentIterator extends Iterator<List<DataSegment>>
Review comment:
> > > * Why does the response include only one task ID? What will happen
if auto compaction issues multiple compaction tasks?
> >
> >
> > Hmm... I am thinking of having the UI component show the status (fail,
success) of the latest task which could indicate if user action is requried or
not.
> > Another idea may be to return a list of all tasks issued during this run
(and empty list if no slot). The UI can select which task it want to show. The
UI can choose the last task ID or it can calcualte success vs fail % rate of
all tasks in last run.
>
> Hmm, I'm not sure what would be a use case of such information of the
tasks issued in each run. Would it be more useful to know how many compaction
task failures there have been recently (over a couple of runs)?
I just realized that we not have a different prefix to task name when they
are issued manually vs issued automatically by the coordinator (auto
compaction). In this case, returning the last task scheduled by auto compaction
is not necessary as we can use the existing task api to get the latest task
that has the `coordinator-issue` prefix in the name. In fact, we can sort by
time and get a history of all the tasks issued by the auto compaction. Thus,
the existing task API can already answer question like how many compaction task
failures there have been recently (over some time period like hours, days,
etc.).
The information that currently cannot be retrieved is a list for all the
tasks issue in the same / latest coordinator run. That provides information you
cannot get currently with the ingestion tab since we have no idea which tasks
issued in the same run. This allows us to know things like % of tasks succeeded
vs. failed vs. canceled in the last run. Moreover, we can then find out % of
tasks succeeded vs. failed vs. canceled in the last run vs previous run (or any
other runs). This allows user to see if changes they made (change to configs,
specs, etc) between the runs fix any previous issues they were seeing.
For example...
- run 1 have 100% failed and 0 % success, then user go fix something and now
run 2 have 100% success and 0% failed. They can see that the fix they made
solve the problem.
- run 1 have 100% failed and 0 % success, then user go fix something and
now run 2 have 0% success and 100% failed. They can see that the fix did not
solve the problem.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]