[ 
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157529#comment-16157529
 ] 

Ruslan Shestopalyuk commented on SPARK-21942:
---------------------------------------------

I can't comment on how common it is in general, but here's one very likely 
scenario:
* an online service is using a spark model to score the requests
* it loads the model on startup
* the model may get updated (retrained) periodically, let's say every two 
weeks, and the service picks up the new one

But yes, it's true that for this to happen, the same spark context has to stay 
alive for relatively long time, so it may not be that common at all.

> DiskBlockManager crashing when a root local folder has been externally 
> deleted by OS
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-21942
>                 URL: https://issues.apache.org/jira/browse/SPARK-21942
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 
> 2.2.0, 2.2.1, 2.3.0, 3.0.0
>            Reporter: Ruslan Shestopalyuk
>            Priority: Minor
>              Labels: storage
>             Fix For: 2.3.0
>
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be 
> configured via _spark.local.dir_ option, and which defaults to the system's 
> _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the 
> _YY_ part is a hash bit, to spread files evenly.
> Function _DiskBlockManager.getFile_ expects the top level directories 
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark 
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different 
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is 
> running for a while (a few days), it may not be able to load files anymore, 
> since the top-level scratch directories are not there and 
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files 
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and 
> that the system has the right to tamper with its contents, and will do it 
> with a high probability, after some period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to