> It really looks like your mapper tasks may be failing to connect to your > kafka server.
ok > Here's a brief overview of what that demo job is doing so you can understand > where the example may have gone wrong. > DataGenerator: > > 1. When DataGenerator is run, it needs the property 'kafka.etl.topic', > and 'kafka.server.uri' set in the properties file. When you run > ./run-class.sh > kafka.etl.impl.DataGenerator test/test.properties, you can tell that > they're properly set by the output 'topic=<blah>' and 'server uri=<kafka > server url>. seems ok : topics=SimpleTestEvent server uri:tcp://<ip_of_my_hostA>:9092 send 1000 SimpleTestEvent count events to tcp://<ip_of_my_hostA>:9092 11/09/01 05:04:05 INFO producer.SyncProducer: Connected to <ip_of_my_hostA>:9092 for producing 11/09/01 05:04:05 INFO producer.SyncProducer: Disconnecting from <ip_of_my_hostA>:9092 Dump tcp://<ip_of_my_hostA>:9092 SimpleTestEvent 0 -1 to /tmp/ben6/data/1.dat > 2. The DataGenerator will create a bunch of dummy messages and pump it to > that kafka server. Afterwards, it will write a file to HDFS at path 'input' > which you also set in the properties file. The file that is created will be > named something like 1.dat. yes , i see it under hadoop directory as specified as 'input' in test.properties. > 3. 1.dat is a sequence file, so if it isn't compressed, you should be > able to see its contents in plain text. The contents will essentially list > the kafka server url, the partition number and the topic as well as the > offset. mine has only 1 line (some encrypted in the middle of it is shown as is): SEQkafka.etl.KafkaETLKey"org.apache.hadoop.io.BytesWritable$?ړ??tl?3aC+tcp://<ip_of_my_hostA>:9092 SimpleTestEvent 0 -1 > 4. In a real scenario, you'll probably create several of these files for > each broker and possibly partition, but for this example, you only need one > file. Each file will spawn a mapper during the mapred step. > > CopyJars: > > 1. This should copy the necessary jars for kafka hadoop, and push them > into HDFS for the distributed cache. If the jars are copied locally instead > of to a remote cluster, most likely HADOOP_CONF_DIR hasn't been set up > correctly. The environment should probably be set by the script, so someone > can change that. yes i've got the proper jars under the hadoop directory now. > SimpleKafkaETLJob > > 1. This job will then setup the distributed classpath, and the input path > should be the directory that 1.dat was written to. > 2. Internally, the mappers will then load 1.dat and use the connection > data contained in it to connect to kafka. If it's trying to connect to > anything but your kafka server, than this file was incorrectly written. can we see a sample of a valid 1.dat file , please ? > 3. The RecordReader wraps all of this and hides all the connection stuff > so that your Mapper should see a stream of Kafka messages rather than the > contents of 1.dat. > > So please see if you can figure out what is wrong with your example and feel > free beef up the README instructions to take in account your pitfalls. ok , yes i plan to do this once i got all the steps right. thx for all your support so far.