huxihx created KAFKA-6425:
-----------------------------
Summary: Calculating cleanBytes in LogToClean might not be correct
Key: KAFKA-6425
URL: https://issues.apache.org/jira/browse/KAFKA-6425
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 1.0.0
Reporter: huxihx
In class `LogToClean`, the calculation for `cleanBytes` is as below:
{code:java}
val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum
{code}
Most of the time, the `firstDirtyOffset` is the base offset of active segment
which works pretty well with log.logSegments, so we can calculate the
cleanBytes by safely summing up the sizes of all log segments whose base offset
is less than `firstDirtyOffset`.
However, things changed after `firstUnstableOffset` was introduced. Users could
indirectly change this offset to a non-base offset(changing log start offset
for instance). In this case, it's not correct to sum up the total size for a
log segment. Instead, we should only sum up the bytes between the base offset
and `firstUnstableOffset`.
Let me show an example:
Say I have three log segments, shown as below:
0L --> log segment1, size: 1000Bytes
1234L --> log segment2, size: 1000Bytes
4567L --> active log segment, current size: 500Bytes
Based on the current code, if `firstUnstableOffset` is deliberately set to
2000L(this could be possible, since it's lower bounded by the log start offset
and user could explicitly change LSO), then `cleanBytes` is calculated as
2000Bytes which is wrong. The expected value should be 1000 + (bytes between
offset 1234L and 2000L)
[~junrao] [~ijuma] Do all of these make sense?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)