lxqfy opened a new issue #7324: Automatic segment compaction segment size not 
matching targetCompactionSizeBytes
URL: https://github.com/apache/incubator-druid/issues/7324
 
 
   Automatic segment compaction segment size not matching 
targetCompactionSizeBytes
   ### Affected Version
   0.13.0-incubationg
   The Druid version where the problem was encountered.
   
   ### Description
   I am trying to use the "Automatic segment compaction". My auto-compaction 
config is as follows:
   {
     "dataSource": ds,
     "inputSegmentSizeBytes":524288000,
     "targetCompactionSizeBytes": 524288000,
     "skipOffsetFromLatest": "PT3H",
     "keepSegmentGranularity": false
   }
   
   However, after the compact task finished, I can see that the segment(shard) 
size is around 130+ MB. Those segments will also be involved in the next round 
compaction tasks and result in the same-size segments as before. Infinite loop.
   
   For example, the compaction task tries to compact 2 segment shards with 
targetCompactionSizeBytes=500mb:
   2019-03-06T04:00:00.000Z/2019-03-08T01:00:00.000Z_1 (130mb)
   2019-03-06T04:00:00.000Z/2019-03-08T01:00:00.000Z_2 (130mb)
   After the compaction task, those 2 segments shards size not compacted to 1 
shard, they remain pretty much the same size, just with a new version. And the 
coordinator will try to compact those segment shard again and again without 
actually compact to single shard of 260mb.
   
   After some investigation, I found that:
   
   The compaction task will generate an internal index task. The 
targetPartionSize is calculated by targetCompactionSizeBytes and 
avgRowsPerByte. 
   
   Estimated targetPartitionSize[%d] = avgRowsPerByte[%f] * 
targetCompactionSizeBytes[%d]
   
   There is another configuration, maxTotalRows for index task, the default 
value is 20000000. If targetPartionSize is larger than maxTotalRows, it won't 
work as expected.
   
   The workaround is set the maxTotalRows to a larger value. By default, users 
won't have any idea about this, maxTotalRows should be overwritten.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to