rayu yuan created SPARK-20238:
---------------------------------
Summary: Is the JavaKafkaWordCount example correct for Spark
version 2.1?
Key: SPARK-20238
URL: https://issues.apache.org/jira/browse/SPARK-20238
Project: Spark
Issue Type: Question
Components: Examples, ML
Affects Versions: 2.1.0
Reporter: rayu yuan
My question is
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
correct?
I'm pretty new to both Spark and Java. I wanted to find an example of Spark
Streaming using Java, streaming from Kafka. The JavaKafkaWordCount at
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
looked to be perfect.
However, when I tried running it, I found a couple of issues that I needed to
overcome.
1. This line was unnecessary:
{code}
StreamingExamples.setStreamingLogLevels();
{code}
Having this line in there (and the associated import) caused me to go looking
for a dependency spark-examples_2.10 which of no real use to me.
2. After running it, this line:
{code}
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, args[0], args[1], topicMap);
{code}
Appeared to throw an error around logging:
{code}
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91
at
org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66
at
org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:11
at
org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala)
at main.java.com.cm.JavaKafkaWordCount.main(JavaKafkaWordCount.java:72)
{code}
To get around this, I found that the code sample in
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
helped me to come up with the right lines to see streaming from Kafka in
action. Specifically this called createDirectStream instead of createStream.
So is the example in
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
or is there something I could have done differently to get that example
working?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]