[jira] [Commented] (KAFKA-6425) Calculating cleanBytes in LogToClean might not be correct

huxihx (JIRA) Thu, 01 Feb 2018 19:42:17 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349734#comment-16349734
 ]


huxihx commented on KAFKA-6425:
-------------------------------

Hi all, any updates for this jira?

> Calculating cleanBytes in LogToClean might not be correct
> ---------------------------------------------------------
>
>                 Key: KAFKA-6425
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6425
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.0
>            Reporter: huxihx
>            Priority: Major
>
> In class `LogToClean`, the calculation for `cleanBytes` is as below:
> {code:java}
> val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum
> {code}
> Most of the time, the `firstDirtyOffset` is the base offset of active segment 
> which works pretty well with log.logSegments, so we can calculate the 
> cleanBytes by safely summing up the sizes of all log segments whose base 
> offset is less than `firstDirtyOffset`.
> However, things changed after `firstUnstableOffset` was introduced. Users 
> could indirectly change this offset to a non-base offset(changing log start 
> offset for instance). In this case, it's not correct to sum up the total size 
> for a log segment. Instead, we should only sum up the bytes between the base 
> offset and `firstUnstableOffset`.
> Let me show an example:
> Say I have three log segments, shown as below:
> 0L       -->  log segment1, size: 1000Bytes
> 1234L -->  log segment2, size: 1000Bytes
> 4567L --> active log segment, current size: 500Bytes
> Based on the current code, if `firstUnstableOffset` is deliberately set to 
> 2000L(this could be possible, since it's lower bounded by the log start 
> offset and user could explicitly change LSO), then `cleanBytes` is calculated 
> as 2000Bytes which is wrong. The expected value should be 1000 + (bytes 
> between offset 1234L and 2000L) 
> [~junrao] [~ijuma] Do all of these make sense?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6425) Calculating cleanBytes in LogToClean might not be correct

Reply via email to