[ https://issues.apache.org/jira/browse/BAHIR-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800091#comment-16800091 ]
Lukasz Antoniak commented on BAHIR-183: --------------------------------------- [~yanlin-Lynn], I suspect that you can loose messages while running the current code version. MQTT guarantees QoS 1 and 2 thanks to client persistence to local file system. Current implementation of MQTT source with HDFS back-end leverages memory persistence for in-flight messages. If Spark application terminates abnormally after picking upĀ message from the broker, but before writing it to HDFS, data can be lost. Did you explore the _MqttClientPersistence_ option? > Using HDFS for saving message for mqtt source > --------------------------------------------- > > Key: BAHIR-183 > URL: https://issues.apache.org/jira/browse/BAHIR-183 > Project: Bahir > Issue Type: Improvement > Components: Spark Structured Streaming Connectors > Affects Versions: Spark-2.2.0 > Reporter: Wang Yanlin > Assignee: Wang Yanlin > Priority: Major > Fix For: Spark-2.4.0 > > > Currently in spark-sql-streaming-mqtt, the received mqtt message is saved in > a local file by driver, this will have the risks of losing data for cluster > mode when application master failover occurs. So saving in-coming mqtt > messages using a director in checkpoint will solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)