[
https://issues.apache.org/jira/browse/SYSTEMML-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691648#comment-15691648
]
Felix Schüler commented on SYSTEMML-1127:
-----------------------------------------
So I see two ways to go here and would need some more info on what's going on
to decide which one to chose:
1) Give each thread its own cache directory
2) Synchronize the LocalFileUtils.createLocalFileIfNotExist() method and have
threads share the cache
It seems like the parfor workers use the folder created in
/tmp/systemml/pid_host use this directory as cache. Is this a cache per process
or per thread? If a worker spawns multiple threads they will run in the same
process and a call to create this directory will generate a race condition and
throw an error. [~mboehm7] could you give me some advice on this?
> Distributed unique IDs are not unique
> -------------------------------------
>
> Key: SYSTEMML-1127
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1127
> Project: SystemML
> Issue Type: Bug
> Components: ParFor
> Reporter: Felix Schüler
>
> When executing a Spark parfor, the SparkParforWorker throws an exception
> which states that the localtmpdir could not be created. This is due to the
> fact that multiple executors are running multithreaded on the same worker.
> The createDistributedUniqueID() method in the IDHander.java creates unique
> IDs only per pid and host, not per thread. This could potentially be solved
> by adding the threadID to the unique ID. The question is if every thread
> should have its own cache or if the logic should be changed so that the first
> creation will be successful and then the threads share one cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)