ConsumerConfig is in the kafka's main trunk. As I used the same package namespace, kafka.consumer, (sure I don't think it's good approach), I didn't have to import it explicitly.
kafka jar is not on the maven repository, you might have to register it into your local maven repository. > mvn install:install-file -Dfile=kafka-0.7.0.jar -DgroupId=kafka > -DartifactId=kafka -Dversion=0.7.0 -Dpackaging=jar Thanks Min 2012/7/13 Murtaza Doctor <murt...@richrelevance.com>: > Hello Min, > > In your github project source code are you missing the ConsumerConfig > class? I was trying to download and play with the source code. > > Thanks, > murtaza > > On 7/3/12 6:29 PM, "Min" <mini...@gmail.com> wrote: > >>I've created another hadoop consumer which uses zookeeper. >> >>https://github.com/miniway/kafka-hadoop-consumer >> >>With a hadoop OutputFormatter, I could add new files to the existing >>target directory. >>Hope this would help. >> >>Thanks >>Min >> >>2012/7/4 Murtaza Doctor <murt...@richrelevance.com>: >>> +1 This surely sounds interesting. >>> >>> On 7/3/12 10:05 AM, "Felix GV" <fe...@mate1inc.com> wrote: >>> >>>>Hmm that's surprising. I didn't know about that...! >>>> >>>>I wonder if it's a new feature... Judging from your email, I assume >>>>you're >>>>using CDH? What version? >>>> >>>>Interesting :) ... >>>> >>>>-- >>>>Felix >>>> >>>> >>>> >>>>On Tue, Jul 3, 2012 at 12:34 PM, Sybrandy, Casey < >>>>casey.sybra...@six3systems.com> wrote: >>>> >>>>> >> - Is there a version of consumer which appends to an existing file >>>>>on >>>>> HDFS >>>>> >> until it reaches a specific size? >>>>> >> >>>>> > >>>>> >No there isn't, as far as I know. Potential solutions to this would >>>>>be: >>>>> > >>>>> > 1. Leave the data in the broker long enough for it to reach the >>>>>size >>>>> you >>>>> > want. Running the SimpleKafkaETLJob at those intervals would give >>>>>you >>>>> the >>>>> > file size you want. This is the simplest thing to do, but the >>>>>drawback >>>>> is >>>>> > that your data in HDFS will be less real-time. >>>>> > 2. Run the SimpleKafkaETLJob as frequently as you want, and then >>>>>roll >>>>> up >>>>> > / compact your small files into one bigger file. You would need to >>>>> come up >>>>> > with the hadoop job that does the roll up, or find one somewhere. >>>>> > 3. Don't use the SimpleKafkaETLJob at all and write a new job that >>>>> makes >>>>> > use of hadoop append instead... >>>>> > >>>>> >Also, you may be interested to take a look at these >>>>> >scripts< >>>>> >>>>>http://felixgv.com/post/88/kafka-distributed-incremental-hadoop-consume >>>>>r/ >>>>> >I >>>>> >posted a while ago. If you follow the links in this post, you can get >>>>> >more details about how the scripts work and why it was necessary to >>>>>do >>>>>the >>>>> >things it does... or you can just use them without reading. They >>>>>should >>>>> >work pretty much out of the box... >>>>> >>>>> Where I work, we discovered that you can keep a file in HDFS open and >>>>> still run MapReduce jobs against the data in that file. What you do >>>>>is >>>>>you >>>>> flush the data periodically (every record for us), but you don't close >>>>>the >>>>> file right away. This allows us to have data files that contain 24 >>>>>hours >>>>> worth of data, but not have to close the file to run the jobs or to >>>>> schedule the jobs for after the file is closed. You can also check >>>>>the >>>>> file size periodically and rotate the files based on size. We use >>>>>Avro >>>>> files, but sequence files should work too according to Cloudera. >>>>> >>>>> It's a great compromise for when you want the latest and greatest >>>>>data, >>>>> but don't want to have to wait until all of the files are closed to >>>>>get >>>>>it. >>>>> >>>>> Casey >>> >