Awesome answers, that's perfect, thanks guys. On Thu, Sep 20, 2012 at 12:26 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
> Try using the getOffsetsBefore API in SimpleConsumer. (There is also a > command-line tool - GetOffsetShell.) > > You can specify a topic, partition and time and it will give valid offsets > prior to that time. It will be approximate though as it looks at the > modtime of the log segments in each partition. If you are using > SimpleConsumer directly you can just consume from those offsets. > > Joel > > On Thu, Sep 20, 2012 at 9:20 AM, Matthew Rathbone <matt...@foursquare.com > >wrote: > > > Hey guys, > > > > I've come across this behavior with the hadoop-consumer, but it certainly > > applies to any consumer. > > > > We've had our brokers up and running for about 9 days, with a 7-day > > retention policy. (3 brokers with 3 partitions each) > > I've just deployed a new hadoop consumer and wanted to read from the > > beginning of time (7-days ago). > > > > Here's the behavior I'm seeing: > > - I tell the consumer to start from 0 > > - It queries the partition, finds the minimum available is 2000000, so it > > starts there > > - It starts consuming from 2000000+ > > - It throws an exception ("kafka.common.OffsetOutOfRangeException") > because > > that message was deleted already > > > > Through sheer luck, after a few task failures the job managed to beat > this > > race condition, but it begs the question: > > > > - How would I tell a consumer to start querying from T-4days? That would > > totally solve the issue. I don't really need a full 7 days, but I have no > > way to resolve time -> offset > > (this is useful if people are tailing the events too, so they can tail > > events from 3 days ago grepping for something) > > > > Any ideas? Anyone else experienced this? > > -- > > Matthew Rathbone > > Foursquare | Software Engineer | Server Engineering Team > > matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | > > 4sq<http://foursquare.com/rathboma> > > > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>