jihoonson commented on a change in pull request #10676:
URL: https://github.com/apache/druid/pull/10676#discussion_r601926994
##########
File path:
indexing-service/src/test/java/org/apache/druid/indexing/common/task/CompactionTaskTest.java
##########
@@ -1447,6 +1452,7 @@ private void assertIngestionSchema(
null,
null,
null,
+ null,
null
Review comment:
Interesting point. I think there are some things we should think about
first.
- It's true that currently compaction doesn't change the underlying data
much, but it can make some changes such as filtering out some unnecessary
dimensions or adding new metrics. You can also change the query granularity
now. In the future, I can imagine that you can even transform your data using
compaction with a new support for transformSpec.
- The compaction task is a bit special and different from other batch tasks
in how it publishes segments. All other batch tasks can push segments in the
middle of indexing, but should publish all those segments at the end of
indexing. However, the compaction task can process each time chunk at a time
when there is no change in segment granularity. In this case, it can publish
segments whenever it finishes processing individual time chunk. It can also go
through all time chunks even when there are some time chunks that it fails to
compact. The final task status will be `FAILED` when it succeeds to compact
only some time chunks but fails for others.
- Compacting datasources is usually not the single-shot type job. Rather,
you would run multiple small compaction tasks over time as in auto compaction.
In that case, you would want to know what time chunks are compacted and what
are not, so that you can determine what result you can get when you query
certain time chunks. For the compaction that is manually set up outside druid,
tracking of individual compaction tasks could be useful for this purpose.
However, for auto compaction, it won't provide much value since compaction
tasks are submitted by the coordinator not users. So, we need another way such
as adding a new coordinator API that returns such compaction status.
From these, we would probably want something similar but different for
compaction from the one proposed here. I would suggest to do it in a different
PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]