guys, so is it right to say that log retention property set to X days uses the last activity on a segment file to determine when to delete a file and if the file size is to set to a large number and the same file keeps getting appended on a daily basis then we won't achieve the 7 day cleanup till either there isn't any activity done for 7 days or it has reached the bigger size and rolled over and stays there for 7 days.
on the other hand a smaller file size will ensure that it rolls over multiple times in 7 days and the segments untouched in 7 days can be knocked off thus optimizing space usage. are the default settings based on certain experimentation and recommended for production use. - Inder On Tue, Oct 25, 2011 at 7:53 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > Inder, > > >> 2. Why would you want to have multiple files within a partition. Broker > has > >> to store more info to figure the right file among a partition. > > There is not much advantage apart from better accuracy with the > getLatestOffeset API. > Using that if you want to start consuming data close to a certain > timestamp, > you get better accuracy if you have smaller log files. > > >> 3. Is it to achieve mmap kinda optimization and allowing the broker to > do > >> less I/O in case a feed is really huge or any thing else. > > Not really. mmap is useful when you have random access on large files, or > have multiple process trying to access the same file. It might actually not > work well with large files if your memory is fragmented. Since we have > sequential IO patterns, the filesystem caching itself works very well. > > Thanks, > Neha > > On Tuesday, October 25, 2011, Jay Kreps <jay.kr...@gmail.com> wrote: > > It is actually just to allow data deletion, we just delete whole segments > in > > the cleanup. There is not much value to tuning the file size for most > > situations, but the tradeoff is that with smaller files you will have > more > > open files but be closer to your desired retention.hours and > retention.size > > settings. > > > > -Jay > > > > On Tue, Oct 25, 2011 at 1:59 AM, Inder Pall <inder.p...@gmail.com> > wrote: > > > >> i am playing around with "log.file.size"(controls the size of a segment > >> file > >> in a partition) and "log.retention.hours" with the following config. > >> log.file.size=500 > >> log.retention.hours=168 > >> > >> Observation - i see multiple files getting generated within the same > >> partition. > >> Example : my topic name is revenue feed and i see the following > >> > >> ls -lh /tmp/kafka-logs/revenuefeed-0/* > >> -rw-r--r-- 1 inder users 537 Oct 25 01:38 > >> /tmp/kafka-logs/revenuefeed-0/00000000000000000000.kafka > >> -rw-r--r-- 1 inder users 512 Oct 25 01:39 > >> /tmp/kafka-logs/revenuefeed-0/00000000000000000537.kafka > >> > >> Questions > >> -------------- > >> 1. Shouldn't these two properties go hand in hand > >> 2. Why would you want to have multiple files within a partition. Broker > has > >> to store more info to figure the right file among a partition. > >> 3. Is it to achieve mmap kinda optimization and allowing the broker to > do > >> less I/O in case a feed is really huge or any thing else. > >> > >> -- Inder > >> > > > -- -- Inder