[GitHub] [druid] liuxiaohui1221 commented on pull request #10861: coordinator compactionTask filter locked interval before submitted

GitBox Sun, 07 Feb 2021 17:30:42 -0800


liuxiaohui1221 commented on pull request #10861:
URL: https://github.com/apache/druid/pull/10861#issuecomment-774811930



   > 
   > 
   > > Hi @liuxiaohui1221, thank you for your contribution. I'm wondering how 
this change can reduce the compaction task failures due to lock contention. 
Here are what should happen when two or more tasks try to lock overlapped 
intervals.
   > > ```
   > > * A high priority task is submitted while a low priority task is 
running. In this case, the high priority task revokes the lock of the low 
priority task. The low priority task stops with the `FAILED` state.
   > > 
   > > * A low or equal priority task is submitted while a high or equal 
priority task is running. In this case, the second task waits (in the `WAITING` 
state) until the first task releases the lock.
   > > ```
   > > 
   > > 
   > > Compaction tasks can fail in the first case above when there is lock 
contention. However, the new overlord API (`getNonLockIntervalSnapshots()`) can 
be useful only for the second case by preventing the coordinator from 
submitting compaction tasks that can lead to lock contention. The 
`skipOffsetFromLatest` in the auto compaction config should be enough to avoid 
both cases unless data can arrive late frequently. Or am I missing something?
   > 
   > @jihoonson Thank you for your comments, yes, our data arrieve late 
frequantely, we collected the data, the specific scenario is as follows: of 
them around `90%` came from the day, but with `10%` distributed in the last 3 
months, I need to set `SkipoffsetFromLatest` large enough to avoid frequent 
Compact task failure.
   > I want to success as much as possible the compression of a lot of small 
files, for this reason, I also want to add a different from 
`NewestSegmentFirstIterator` iteration strategy: 
`HighScoreSegmentFirstIterator`, it consider interval, segmentsNum, 
segmentsSize three factors to calculate the score(like at first,use `Min - Max 
standardization` to convert these columns,and then intervalScore=w1 * 
intervalEndTime+w2 * segmentsSize+w3 * segmentsNum), high score interval will 
submit a compaction task priority.
   
   In fact, 90% of the real-time data arrives, the vast majority of the data 
arrives within the last month, very little data arrives within a month or more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] liuxiaohui1221 commented on pull request #10861: coordinator compactionTask filter locked interval before submitted

Reply via email to