[ https://issues.apache.org/jira/browse/KAFKA-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349734#comment-16349734 ]
huxihx commented on KAFKA-6425: ------------------------------- Hi all, any updates for this jira? > Calculating cleanBytes in LogToClean might not be correct > --------------------------------------------------------- > > Key: KAFKA-6425 > URL: https://issues.apache.org/jira/browse/KAFKA-6425 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 1.0.0 > Reporter: huxihx > Priority: Major > > In class `LogToClean`, the calculation for `cleanBytes` is as below: > {code:java} > val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum > {code} > Most of the time, the `firstDirtyOffset` is the base offset of active segment > which works pretty well with log.logSegments, so we can calculate the > cleanBytes by safely summing up the sizes of all log segments whose base > offset is less than `firstDirtyOffset`. > However, things changed after `firstUnstableOffset` was introduced. Users > could indirectly change this offset to a non-base offset(changing log start > offset for instance). In this case, it's not correct to sum up the total size > for a log segment. Instead, we should only sum up the bytes between the base > offset and `firstUnstableOffset`. > Let me show an example: > Say I have three log segments, shown as below: > 0L --> log segment1, size: 1000Bytes > 1234L --> log segment2, size: 1000Bytes > 4567L --> active log segment, current size: 500Bytes > Based on the current code, if `firstUnstableOffset` is deliberately set to > 2000L(this could be possible, since it's lower bounded by the log start > offset and user could explicitly change LSO), then `cleanBytes` is calculated > as 2000Bytes which is wrong. The expected value should be 1000 + (bytes > between offset 1234L and 2000L) > [~junrao] [~ijuma] Do all of these make sense? -- This message was sent by Atlassian JIRA (v7.6.3#76005)