[ https://issues.apache.org/jira/browse/BAHIR-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770130#comment-16770130 ]
Lukasz Antoniak commented on BAHIR-183: --------------------------------------- [~yanlin-Lynn], I am not sure whether offset management is implemented correctly here. We store lots of messages in one HDFS file and loose track of singe message offset. Did you try to implement `MqttClientPersistence` interface to store data in HDFS? The implementation would be a lot cleaner this way, as user would only choose between memory, local file, or HDFS storage. > Using HDFS for saving message for mqtt source > --------------------------------------------- > > Key: BAHIR-183 > URL: https://issues.apache.org/jira/browse/BAHIR-183 > Project: Bahir > Issue Type: Improvement > Components: Spark Structured Streaming Connectors > Affects Versions: Spark-2.2.0 > Reporter: Wang Yanlin > Assignee: Wang Yanlin > Priority: Major > Fix For: Spark-2.4.0 > > > Currently in spark-sql-streaming-mqtt, the received mqtt message is saved in > a local file by driver, this will have the risks of losing data for cluster > mode when application master failover occurs. So saving in-coming mqtt > messages using a director in checkpoint will solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)