[jira] [Commented] (KAFKA-475) Time based log segment rollout

Neha Narkhede (JIRA) Tue, 21 Aug 2012 10:57:40 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438911#comment-13438911
 ]


Neha Narkhede commented on KAFKA-475:
-------------------------------------

If you roll log segments based on retention time, seems like you can have only 
one segment for that log at any point of time. If you want to roll 5 minute 
segments, it means that you can only have 5 minute worth of data for that 
partition. On the contrary, if I choose size based rolling and size based 
retention, I can have multiple log segments each of a specific size. What seems 
desirable is to have time based rolling + retention also behave the same way. I 
would imagine applications wanting to roll segments every 1 hour and retain 24 
hours worth of data. This is an advantage for applications using 
getOffsetsBefore() to do some time indexed fetch of the data, since 
getOffsetsBefore only returns offsets at the log segment granularity. And it 
also gives applications a way to reason about the time window of the data 
retained for a partition. One potential downside is that, you can end up 
creating large number of log segments for your partition, if you choose too 
small a value for log.file.time.ms. But this problem exists today with size 
based log segment rolling too. So we are not introducing any regression in 
behavior.

Other review comments -

1. Log
1.1 Rename currentMS to currentMs (Follow camel case convention).
1.2 How about renaming retentionMSInterval to retentionIntervalMs to be 
consistent with naming convention ?
1.3 In maybeRoll, looks like currentMS is unused apart from being used to 
compute the time difference. How about removing currentMS ?

2. LogManager
2.1 This is unrelated to your patch, but lets also rename logRetentionMSMap to 
logRetentionMsMap


                
> Time based log segment rollout
> ------------------------------
>
>                 Key: KAFKA-475
>                 URL: https://issues.apache.org/jira/browse/KAFKA-475
>             Project: Kafka
>          Issue Type: New Feature
>    Affects Versions: 0.7.1
>            Reporter: Swapnil Ghike
>            Assignee: Swapnil Ghike
>              Labels: features
>             Fix For: 0.7.2
>
>         Attachments: kafka-475-v1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Some applications might want their data to be deleted from the Kafka servers 
> earlier than the default retention time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-475) Time based log segment rollout

Reply via email to