Max Xu created SPARK-5147:
-----------------------------
Summary: wrtie ahead logs from streaming receiver are not purged
because cleanupOldBlocks in WriteAheadLogBasedBlockHandler is never called
Key: SPARK-5147
URL: https://issues.apache.org/jira/browse/SPARK-5147
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.2.0
Reporter: Max Xu
Hi all,
We are running a Spark streaming application with ReliableKafkaReceiver. We
have "spark.streaming.receiver.writeAheadLog.enable" set to true so write ahead
logs (WALs) for received data are created under receivedData/streamId folder in
the checkpoint directory.
However, old WALs are never purged by time. receivedBlockMetadata and
checkpoint files are purged correctly though. I went through the code,
WriteAheadLogBasedBlockHandler class in ReceivedBlockHandler.scala is
responsible for cleaning up the old blocks. It has method cleanupOldBlocks,
which is never called by any class. ReceiverSupervisorImpl class holds a
WriteAheadLogBasedBlockHandler instance. However, it only calls storeBlock
method to create WALs but never calls cleanupOldBlocks method to purge old WALs.
The size of the WAL folder increases constantly on HDFS. This is preventing us
from running the ReliableKafkaReceiver 24x7. Can somebody please take a look.
Thanks,
Max
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]