Shaofeng SHI created KYLIN-3679:
-----------------------------------
Summary: Fetch Kafka topic with Spark streaming
Key: KYLIN-3679
URL: https://issues.apache.org/jira/browse/KYLIN-3679
Project: Kylin
Issue Type: New Feature
Components: Spark Engine
Reporter: Shaofeng SHI
Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist to
HDFS for subsequent processing. If user selects to use Spark engine, we can use
Spark streaming API to do this. Spark streaming can read the Kafka message in a
given offset range as a RDD, then it would be easy to process;
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
With Spark streaming, Kylin can also easily connect with other data source like
Kinesis, Flume, etc.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)