etienne created SPARK-17606: ------------------------------- Summary: New batches are not created when there are 1000 created after restarting streaming from checkpoint. Key: SPARK-17606 URL: https://issues.apache.org/jira/browse/SPARK-17606 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.1 Reporter: etienne
When spark restarts from a checkpoint after being down for a while. It recreates missing batch since the down time. When there are few missing batches, spark creates new incoming batch every batchTime, but when there is enough missing time to create 1000 batches no new batch is created. So when all these batch are completed the stream is idle ... I think there is a rigid limit set somewhere. I was expecting that spark continue to recreate missed batches, maybe not all at once ( because it's look like it's cause driver memory problem ), and then recreate batches each batchTime. Another solution would be to not create missing batches but still restart the direct input. Right know for me the only solution to restart a stream after a long break it to remove the checkpoint to allow the creation of a new stream. But losing all my states. ps : I'm speaking about direct Kafka input because it's the source I'm currently using, I don't know what happens with other sources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org