Hey guys, I've come across this behavior with the hadoop-consumer, but it certainly applies to any consumer.
We've had our brokers up and running for about 9 days, with a 7-day retention policy. (3 brokers with 3 partitions each) I've just deployed a new hadoop consumer and wanted to read from the beginning of time (7-days ago). Here's the behavior I'm seeing: - I tell the consumer to start from 0 - It queries the partition, finds the minimum available is 2000000, so it starts there - It starts consuming from 2000000+ - It throws an exception ("kafka.common.OffsetOutOfRangeException") because that message was deleted already Through sheer luck, after a few task failures the job managed to beat this race condition, but it begs the question: - How would I tell a consumer to start querying from T-4days? That would totally solve the issue. I don't really need a full 7 days, but I have no way to resolve time -> offset (this is useful if people are tailing the events too, so they can tail events from 3 days ago grepping for something) Any ideas? Anyone else experienced this? -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>