Hey guys,

I've come across this behavior with the hadoop-consumer, but it certainly
applies to any consumer.

We've had our brokers up and running for about 9 days, with a 7-day
retention policy. (3 brokers with 3 partitions each)
I've just deployed a new hadoop consumer and wanted to read from the
beginning of time (7-days ago).

Here's the behavior I'm seeing:
- I tell the consumer to start from 0
- It queries the partition, finds the minimum available is 2000000, so it
starts there
- It starts consuming from 2000000+
- It throws an exception ("kafka.common.OffsetOutOfRangeException") because
that message was deleted already

Through sheer luck, after a few task failures the job managed to beat this
race condition, but it begs the question:

- How would I tell a consumer to start querying from T-4days? That would
totally solve the issue. I don't really need a full 7 days, but I have no
way to resolve time -> offset
(this is useful if people are tailing the events too, so they can tail
events from 3 days ago grepping for something)

Any ideas? Anyone else experienced this?
-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>

Reply via email to