It's ok, we're all busy and open source is essentially volunteer work. Besides, you guys didn't promise any time frame, as far as I remember, so technically there is no deadline at which you'll ever "break your promise" hehe...
Still looking forward to it though :) -- Felix On Tue, Jan 24, 2012 at 5:12 PM, Richard Park <richard.b.p...@gmail.com>wrote: > Yeah, sorry about missing the promise to release code. > I'll talk to someone about releasing what we have. > > On Tue, Jan 24, 2012 at 11:05 AM, Felix GV <fe...@mate1inc.com> wrote: > > > Hello :) > > > > For question 1: > > > > The hadoop consumer in the contrib directory has almost everything it > needs > > to do distributed incremental imports out of the box, but it requires a > bit > > of hand holding. > > > > I've created two scripts to automate the process. One of them generates > > initial offset files, and the other does incremental hadoop consumption. > > > > I personally use a cron job to periodically call the incremental consumer > > script with specific parameters (for topic and HDFS path output). > > > > You can find all of the required files in this gist: > > https://gist.github.com/1671887 > > > > The LinkedIn guys promised to release their full Hadoop/Kafka ETL code > > eventually but I think they didn't have time to get around to it yet. > When > > they do release it, it's probably going to be better than my scripts, but > > for now, I think those scripts are the only publically available way to > do > > this stuff without writing it yourself. > > > > I don't know about question 2 and 3. > > > > I hope this helps :) ! > > > > -- > > Felix > > > > > > > > On Tue, Jan 24, 2012 at 3:24 AM, Paul Ingles <p...@forward.co.uk> wrote: > > > > > Hi, > > > > > > I'm investigating using Kafka and would really appreciate getting some > > > more experienced opinion on the way things work together. > > > > > > Our application instances are creating Protocol Buffer serialized > > messages > > > and pushing them to topics in Kafka: > > > > > > * Web log requests > > > * Product details viewed > > > * Search performed > > > * Email registered > > > etc... > > > > > > I would like to be able to perform incremental loads from these topics > > > into HDFS and then into the rest of the batch processing. I guess I > had 3 > > > broad questions > > > > > > 1) How do people trigger the batch loads? Do you just point your > > > SimpleKafkaETLJob input to the previous runs outputted offset file? Do > > you > > > move files between runs of the SimpleKafkaETLJob- move the part-* file > > into > > > one place and move the offsets into an input directory ready for the > next > > > run? > > > > > > 2) Yesterday I noticed that the hadoop-consumer's SimpleKafkaETLMapper > > > outputs Long/Text writables and is marked as deprecated (this is in the > > 0.7 > > > source). Is there an alternative class that should be used instead, or > is > > > the hadoop-consumer being deprecated overall? > > > > > > 3) Given the SimpleKafkaETLMapper reads bytes in but outputs Text > lines, > > > are most people using Kafka for passing text messages around or using > > JSON > > > data etc.? > > > > > > Thanks, > > > Paul > > >