jihoonson commented on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising URL: https://github.com/apache/incubator-druid/issues/6989#issuecomment-461221023 @glasser > 1\. Where should tests of this new logic end up? And how do I actually run parts of the Druid test suite? I'm not very experienced with Maven — I know how to run `mvn install` to run all of the tests for all of Druid, but not anything more specific. (I use IntelliJ if that helps.) I would recommend to use Intellij or any other IDE you prefer. If you want to do in the terminal, you can run `mvn test -Dtest=TestClass` and `mvn verify -P integration-tests -Dit.test=TestClass` for unit tests and integration tests, respectively. > 2\. If it's OK to dynamically add locks one by one as the task runs, why do the local and hadoop indexing tasks do an initial scan to determine all the intervals at once? Do they need to do that scan for some other reason anyway? I think it's because of https://github.com/apache/incubator-druid/pull/4550. Those classes were written before https://github.com/apache/incubator-druid/pull/4550, and at that time, there was no concept of revoking locks. As a result, if two or more tasks get locks one by one dynamically, they might get stuck in the middle of ingestion. Moreover, it can cause a deadlock if they block each other. However, now, the higher priority tasks can preempt lower priority tasks. > 3\. General batch ingestion/segment replacement question: if you're using batch ingestion (of any kind: Hadoop, native, local) with granularitySpec interval specified and appendToExisting false, to re-ingest to an interval that already contains data, but there is an time chunk of the data source's segment granularity that has no row in your batch ingestion run, what will happen to the data in that time chunk? It seems to me that nothing will happen because I haven't seen anything that creates empty segments for a time chunk with no data, and so there's no segment to overshadow the old segment. Is that expected? Is there a good way to say "replace this interval of time with data from this batch job, including dropping segments from time chunks if there's nothing there"? We're considering using batch ingestion with the ingestSegment firehose and filtering in order to retain only specific rarer kinds of data past a certain distance in the past, and it's possible to imagine that that data might be missing for an entire hour here and there. Good question. I don't think we're currently supporting that kind of replacement. But maybe it's worth to support.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
