[GitHub] glasser edited a comment on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising

GitBox Wed, 06 Feb 2019 12:47:00 -0800

glasser edited a comment on issue #6989: Behavior of index_parallel with 
appendToExisting=false and no bucketIntervals in GranularitySpec is surprising
URL: 
https://github.com/apache/incubator-druid/issues/6989#issuecomment-461108169
 
 
   Hi @jihoonson. A couple random questions while I work on this:
   
   1. Where should tests of this new logic end up?  And how do I actually run 
parts of the Druid test suite? I'm not very experienced with Maven — I know how 
to run `mvn install` to run all of the tests for all of Druid, but not anything 
more specific. (I use IntelliJ if that helps.)
   
   2. If it's OK to dynamically add locks one by one as the task runs, why do 
the local and hadoop indexing tasks do an initial scan to determine all the 
intervals at once? Do they need to do that scan for some other reason anyway?
   
   3. General batch ingestion/segment replacement question: if you're using 
batch ingestion (of any kind: Hadoop, native, local) with granularitySpec 
interval specified and appendToExisting false, to re-ingest to an interval that 
already contains data, but there is an time chunk of the data source's segment 
granularity that has no row in your batch ingestion run, what will happen to 
the data in that time chunk? It seems to me that nothing will happen because I 
haven't seen anything that creates empty segments for a time chunk with no 
data, and so there's no segment to overshadow the old segment.  Is that 
expected?  Is there a good way to say "replace this interval of time with data 
from this batch job, including dropping segments from time chunks if there's 
nothing there"?  We're considering using batch ingestion with the ingestSegment 
firehose and filtering in order to retain only specific rarer kinds of data 
past a certain distance in the past, and it's possible to imagine that that 
data might be missing for an entire hour here and there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] glasser edited a comment on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising

Reply via email to