[ 
https://issues.apache.org/jira/browse/KAFKA-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362105#comment-16362105
 ] 

Ben Corlett commented on KAFKA-6264:
------------------------------------

In our production cluster we are running 0.10.2.1 and are seeing the log 
cleaning fail on 3 of the boxes with:
{code:java}
[2018-02-07 14:40:23,820] INFO Cleaner 0: Cleaning segment 0 in log 
__consumer_offsets-17 (largest timestamp Wed Aug 09 07:28:11 BST 2017) into 0, 
discarding deletes. (kafka.log.LogCleaner)
[2018-02-07 14:40:23,830] ERROR [kafka-log-cleaner-thread-0]: Error due to 
(kafka.log.LogCleaner)
java.lang.IllegalArgumentException: requirement failed: largest offset in 
message set can not be safely converted to relative offset.
 at scala.Predef$.require(Predef.scala:277)
 at kafka.log.LogSegment.append(LogSegment.scala:121)
 at kafka.log.Cleaner.cleanInto(LogCleaner.scala:551)
 at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:444)
 at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:385)
 at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:384)
 at scala.collection.immutable.List.foreach(List.scala:389)
 at kafka.log.Cleaner.doClean(LogCleaner.scala:384)
 at kafka.log.Cleaner.clean(LogCleaner.scala:361)
 at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:256)
 at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:236)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
[2018-02-07 14:40:23,833] INFO [kafka-log-cleaner-thread-0]: Stopped 
(kafka.log.LogCleaner)
{code}
I can see high fd counts on these servers:
{code:java}
ssh xxxxxxx 'sudo ls -latr /proc/`pgrep java`/fd | wc -l'
130149
ssh xxxxxxx 'sudo ls -latr /proc/`pgrep java`/fd | wc -l'
147455
ssh xxxxxxx 'sudo ls -latr /proc/`pgrep java`/fd | wc -l'
155521
{code}
I've tried several restarts. The log cleaner would fall over each time. I tried 
to upgrade one of the affected servers from 0.10.2.1 to 0.11.0.2. The log 
cleaner still failed.

I'm guessing I'm going to have to hack the files on the filesystem. Looking at 
the affected partition:
{code:java}
-rw-r--r-- 1 kafka root 122372 Aug 10 2017 00000000000000000000.log
-rw-r--r-- 1 kafka root 2424 Aug 14 2017 00000000004019345048.log
-rw-r--r-- 1 kafka root 20956142 Aug 15 07:28 00000000004020192019.log
-rw-r--r-- 1 kafka root 20986067 Aug 16 07:28 00000000004020403517.log
-rw-r--r-- 1 kafka root 20984625 Aug 17 07:28 00000000004020615318.log
...

-rw-r--r-- 1 kafka kafka 184 Feb 7 14:39 00000000000000000000.index
-rw-r--r-- 1 kafka root 0 Feb 7 14:36 00000000004019345048.index
-rw-r--r-- 1 kafka root 40208 Feb 7 14:36 00000000004020192019.index
-rw-r--r-- 1 kafka root 40328 Feb 7 14:36 00000000004020403517.index
-rw-r--r-- 1 kafka root 40336 Feb 7 14:36 00000000004020615318.index
...
{code}
I guess I'm looking for some advice on how to fix this.

Should I just remove the "00000000000000000000" files. Loosing Consumer offsets 
not updated since 10th of Aug 2017 shouldn't be an issue. Or should I try to 
empty these files? Try to figure out the starting offset of these files?

Thanks

 

> Log cleaner thread may die on legacy segment containing messages whose 
> offsets are too large
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6264
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6264
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.2.1, 1.0.0, 0.11.0.2
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> We encountered a problem that some of the legacy log segments contains 
> messages whose offsets are larger than {{SegmentBaseOffset + Int.MaxValue}}.
> Prior to 0.10.2.0, we do not assert the offset of the messages when appending 
> them to the log segments. Due to KAFKA-5413, the log cleaner may append 
> messages whose offset is greater than {{base_offset + Int.MaxValue}} into the 
> segment during the log compaction.
> After the brokers are upgraded, those log segments cannot be compacted 
> anymore because the compaction will fail immediately due to the offset range 
> assertion we added to the LogSegment.
> We have seen this issue in the __consumer_offsets topic so it could be a 
> general problem. There is no easy solution for the users to recover from this 
> case. 
> One solution is to split such log segments in the log cleaner once it sees a 
> message with problematic offset and append those messages to a separate log 
> segment with a larger base_offset.
> Due to the impact of the issue. We may want to consider backporting the fix 
> to previous affected versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to