Re: Consuming from X days ago & issues consuming from the beginning of time

Neha Narkhede Thu, 20 Sep 2012 10:26:06 -0700

Matthew,

The SimpleConsumer exposes a getOffsetsBefore() API that takes in a
timestamp and returns the approximate offsets closer to that
timestamp.
However, the offset granularity is at the log segment level. That
means, you might receive ~ 1 log segment worth of data more than what
you asked for.
The log segment size can be configured through the log.file.size
parameter on the servers.


Thanks,
Neha

On Thu, Sep 20, 2012 at 9:20 AM, Matthew Rathbone
<matt...@foursquare.com> wrote:
> Hey guys,
>
> I've come across this behavior with the hadoop-consumer, but it certainly
> applies to any consumer.
>
> We've had our brokers up and running for about 9 days, with a 7-day
> retention policy. (3 brokers with 3 partitions each)
> I've just deployed a new hadoop consumer and wanted to read from the
> beginning of time (7-days ago).
>
> Here's the behavior I'm seeing:
> - I tell the consumer to start from 0
> - It queries the partition, finds the minimum available is 2000000, so it
> starts there
> - It starts consuming from 2000000+
> - It throws an exception ("kafka.common.OffsetOutOfRangeException") because
> that message was deleted already
>
> Through sheer luck, after a few task failures the job managed to beat this
> race condition, but it begs the question:
>
> - How would I tell a consumer to start querying from T-4days? That would
> totally solve the issue. I don't really need a full 7 days, but I have no
> way to resolve time -> offset
> (this is useful if people are tailing the events too, so they can tail
> events from 3 days ago grepping for something)
>
> Any ideas? Anyone else experienced this?
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> |
> 4sq<http://foursquare.com/rathboma>

Re: Consuming from X days ago & issues consuming from the beginning of time

Reply via email to