[jira] [Commented] (SPARK-5147) write ahead logs from streaming receiver are not purged because cleanupOldBlocks in WriteAheadLogBasedBlockHandler is never called

Saisai Shao (JIRA) Wed, 14 Jan 2015 18:55:08 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278161#comment-14278161
 ]


Saisai Shao commented on SPARK-5147:
------------------------------------

1. Currently detecting whether to delete the WAL is at clearMetatdata(), which 
is checkpointing and after the batch is finished.

2. Actually this is point what I'm thinking of and can be improved possibly.

The throughput what I mean is that 2 copies of BM and 3 copies of HDFS will 
occupy the network bandwidth more than of 1 copies of BM, 3 copies of HDFS, not 
the response time. Yes the HDFS replication is much slower than BM replication. 
But concurrently both replicating to BM and HDFS will increase the network 
bandwidth contention and lower the throughput of whole system.

> write ahead logs from streaming receiver are not purged because 
> cleanupOldBlocks in WriteAheadLogBasedBlockHandler is never called
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5147
>                 URL: https://issues.apache.org/jira/browse/SPARK-5147
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Streaming
>    Affects Versions: 1.2.0
>            Reporter: Max Xu
>            Priority: Blocker
>
> Hi all,
> We are running a Spark streaming application with ReliableKafkaReceiver. We 
> have "spark.streaming.receiver.writeAheadLog.enable" set to true so write 
> ahead logs (WALs) for received data are created under receivedData/streamId 
> folder in the checkpoint directory. 
> However, old WALs are never purged by time. receivedBlockMetadata and 
> checkpoint files are purged correctly though. I went through the code, 
> WriteAheadLogBasedBlockHandler class in ReceivedBlockHandler.scala is 
> responsible for cleaning up the old blocks. It has method cleanupOldBlocks, 
> which is never called by any class. ReceiverSupervisorImpl class holds a 
> WriteAheadLogBasedBlockHandler  instance. However, it only calls storeBlock 
> method to create WALs but never calls cleanupOldBlocks method to purge old 
> WALs.
> The size of the WAL folder increases constantly on HDFS. This is preventing 
> us from running the ReliableKafkaReceiver 24x7. Can somebody please take a 
> look.
> Thanks,
> Max



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5147) write ahead logs from streaming receiver are not purged because cleanupOldBlocks in WriteAheadLogBasedBlockHandler is never called

Reply via email to