[GitHub] [kafka] dongxiaohe commented on pull request #5611: KAFKA-7321: time based log compaction (KIP-354)

GitHub Wed, 19 Sep 2018 22:53:45 -0700

I guess this is the core logic to calculate `mostCleanOffset`. It is better 
that engineers from confluent can help review this as well.


For me, if I understand it correctly, iterator is used for performance reason 
to achieve O(n) look up, but I guess it could be simplified like:
```
    val greaterThanCleanUtilTime = dirtyNonActiveSegments.headOption.exists(_ 
=> firstDirtySegmentCreateTime >= cleanUtilTime) ||
      dirtyNonActiveSegments.tail.exists(_.largestTimestamp >= cleanUtilTime)

    (log.firstUnstableOffset.map(_.messageOffset), greaterThanCleanUtilTime) 
match {
      case (None, true) => firstDirtyOffset
      case (None, false) => log.activeSegment.baseOffset
      case (Some(offset), true) => math.min(offset, firstDirtyOffset)
      case (Some(offset), false) => math.min(offset, 
log.activeSegment.baseOffset)
    }
```
- I am not sure `dirtyNonActiveSegments.exists(_.largestTimestamp >= 
cleanUtilTime)` is ok for this, but `headOption` and `tail` should achieve what 
you are trying to do.
- `exists` unlike `find` would return true/false instead of Option[Element], 
but both will end the iteration if the `condition` is matched. 
- I guess pattern matching makes more sense here, since List[Option[T]].flatten 
is hiding the logic to filter out None value implicitly.

[ Full content available at: https://github.com/apache/kafka/pull/5611 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [kafka] dongxiaohe commented on pull request #5611: KAFKA-7321: time based log compaction (KIP-354)

Reply via email to