Ayush Verma created BAHIR-223: --------------------------------- Summary: Concern around reliability of sql-streaming-sqs Key: BAHIR-223 URL: https://issues.apache.org/jira/browse/BAHIR-223 Project: Bahir Issue Type: Bug Components: Spark Structured Streaming Connectors Reporter: Ayush Verma
Looking at the source for the *sql-streaming-sqs* connector, it seems that we delete the messages in SQS on every fetchMaxOffset() call. [https://github.com/apache/bahir/blob/3912360ca5bcca269a30ff42120cac46934693c4/sql-streaming-sqs/src/main/scala/org/apache/spark/sql/streaming/sqs/SqsSource.scala#L106] My understanding of a spark streaming source is that a call to the commit() method signals that spark has completed processing up-to the given offset. Should we not delete the SQS messages on a call to commit() instead? -- This message was sent by Atlassian Jira (v8.3.4#803005)