hachikuji commented on a change in pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#discussion_r621734730



##########
File path: core/src/test/scala/unit/kafka/log/LogCleanerTest.scala
##########
@@ -984,19 +1003,26 @@ class LogCleanerTest {
 
     def distinctValuesBySegment = log.logSegments.map(s => 
s.log.records.asScala.map(record => 
TestUtils.readString(record.value)).toSet.size).toSeq
 
-    val disctinctValuesBySegmentBeforeClean = distinctValuesBySegment
+    val distinctValuesBySegmentBeforeClean = distinctValuesBySegment
     assertTrue(distinctValuesBySegment.reverse.tail.forall(_ > N),
       "Test is not effective unless each segment contains duplicates. Increase 
segment size or decrease number of keys.")
 
+    log.updateHighWatermark(log.activeSegment.baseOffset)
     cleaner.clean(LogToClean(new TopicPartition("test", 0), log, 0, 
firstUncleanableOffset))
 
     val distinctValuesBySegmentAfterClean = distinctValuesBySegment
 
-    
assertTrue(disctinctValuesBySegmentBeforeClean.zip(distinctValuesBySegmentAfterClean)
-      .take(numCleanableSegments).forall { case (before, after) => after < 
before },
+    // One segment should have been completely deleted, so there will be fewer 
segments.
+    assertTrue(distinctValuesBySegmentAfterClean.size < 
distinctValuesBySegmentBeforeClean.size)
+
+    // Drop the first segment from before cleaning since it was removed. Also 
subtract 1 from numCleanableSegments
+    val normalizedDistinctValuesBySegmentBeforeClean = 
distinctValuesBySegmentBeforeClean.drop(1)

Review comment:
       The logic in this test case has become rather obscure after the change. 
Maybe we could do something simpler than comparing segment by segment. As far 
as I can tell, all the test is doing is ensuring that the first uncleanable 
offset is respected. Maybe a simpler test would just write the same key over 
and over and then assert that all records below the uncleanable offset are 
removed and all records above that offset are retained?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to