Hello, This is about SPARK-3276 and I want to make MIN_REMEMBER_DURATION (that is now a constant) a variable (configurable, with a default value). Before spending effort on developing something and creating a pull request, I wanted to consult with the core developers to see which approach makes most sense, and has the higher probability of being accepted.
The constant MIN_REMEMBER_DURATION can be seen at: https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L338 it is marked as private member of private[streaming] object FileInputDStream. Approach 1: Make MIN_REMEMBER_DURATION a variable, with a new name of minRememberDuration, and then add a new fileStream method to JavaStreamingContext.scala : https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala such that the new fileStream method accepts a new parameter, e.g. minRememberDuration: Int (in seconds), and then use this value to set the private minRememberDuration. Approach 2: Create a new, public Spark configuration property, e.g. named spark.rememberDuration.min (with a default value of 60 seconds), and then set the private variable minRememberDuration to the value of this Spark property. Approach 1 would mean adding a new method to the public API, Approach 2 would mean creating a new public Spark property. Right now, approach 2 seems more straightforward and simpler to me, but nevertheless I wanted to have the opinions of other developers who know the internals of Spark better than I do. Kind regards, Emre Sevinç