Could anyone answer my question?
_
From: Chen, Yan I
Sent: 2016, June, 14 1:34 PM
To: 'user@spark.apache.org'
Subject: restarting of spark streaming
Hi,
I notice that in the process of restarting, spark streaming will try to
recover/replay all the batches it missed. But in this process, will streams be
checkpointed like the way they are checkpointed in the normal process?
Does anyone know?
Sometimes our cluster goes maintenance, and our streaming process is shutdown
for e.g. 1 day and restarted. If replaying batches in this period of time
without checkpointing, the RDD chain will be very big, and memory usage will
keep going up until all missing batches are replayed.
[memory usage will keep going up until all missing batches are replayed]: this
is what we observe now.
Thanks,
Yan Chen
___
If you received this email in error, please advise the sender (by return email
or otherwise) immediately. You have consented to receive the attached
electronically at the above-noted email address; please retain a copy of this
confirmation for future reference.
Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation
pour les fins de reference future.