Saisai Shao created SPARK-3146: ---------------------------------- Summary: Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM Key: SPARK-3146 URL: https://issues.apache.org/jira/browse/SPARK-3146 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: Saisai Shao
Currently Spark Streaming Kafka API stores the key and value of each message into BM for processing, potentially this may lose the flexibility for different requirements: 1. currently topic/partition/offset information for each message is discarded by KafkaInputDStream. In some scenarios people may need this information to better filter the message, like SPARK-2388 described. 2. People may need to add timestamp for each message when feeding into Spark Streaming, which can better measure the system latency. So here we add a messageHandler in interface to give people the flexibility to preprocess the message before storing into BM. In the meantime time this improvement keep compatible with current API. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org