There should be no changes to the way you create topics to achieve
this kind of HDFS data load for Kafka. At LinkedIn we use the
InputFormat provided in contrib/hadoop-consumer to load the data for
topics in daily and hourly partitions. These Hadoop jobs run every 10
mins or so. So the maximum delay of data being available from
producer->Hadoop is around 10 mins.

Thanks,
Neha

On Sun, Nov 6, 2011 at 8:45 AM, Mark <static.void....@gmail.com> wrote:
> This is more of a general design question but what is the preferred way of
> importing logs from Kafka to HDFS when you want your data segmented by hour
> or day? Is there anyway to say "Import only this {hour|day} of logs" or does
> one need to create their topics around the way they would like to import
> them.. ie Topic: "search_logs/2011/11/06". If its the latter, is there any
> documentation/best practices on topic/key design?
>
> Thanks
>

Reply via email to