[ 
https://issues.apache.org/jira/browse/SPARK-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177276#comment-16177276
 ] 

Chris Kanich commented on SPARK-19417:
--------------------------------------

This is my hacked up runtime library loader:
{{
# nonstandard imports
for lib in glob.glob("/home/ckanich/libs/*"):
    try:
        sc.addPyFile(lib)
    except Exception as e:
        if "already registered" in str(e):
            continue
        else:
            raise
}}

If I'm developing/debugging one of these libraries outside of Spark, I would 
like to be able to reload() to use an updated version without having to restart 
the context. My python library-fu isn't amazing, but I believe that the library 
file names need to stay the same so the rest of the code works. Looking at that 
PR discussion, I do believe that having spark.files.overwrite available as an 
option but it not functioning was the maddening part that made me feel the need 
to write up bug & test case.

> spark.files.overwrite is ignored
> --------------------------------
>
>                 Key: SPARK-19417
>                 URL: https://issues.apache.org/jira/browse/SPARK-19417
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Chris Kanich
>
> I have not been able to get Spark to actually overwrite a file after I have 
> changed it on the driver node, re-called addFile, and then used it on the 
> executors again. Here's a failing test.
> {code}
>   test("can overwrite files when spark.files.overwrite is true") {
>     val dir = Utils.createTempDir()
>     val file = new File(dir, "file")
>     try {
>       Files.write("one", file, StandardCharsets.UTF_8)
>       sc = new SparkContext(new 
> SparkConf().setAppName("test").setMaster("local-cluster[1,1,1024]")
>          .set("spark.files.overwrite", "true"))
>       sc.addFile(file.getAbsolutePath)
>       def getAddedFileContents(): String = {
>         sc.parallelize(Seq(0)).map { _ =>
>           scala.io.Source.fromFile(SparkFiles.get("file")).mkString
>         }.first()
>       }
>       assert(getAddedFileContents() === "one")
>       Files.write("two", file, StandardCharsets.UTF_8)
>       sc.addFile(file.getAbsolutePath)
>       assert(getAddedFileContents() === "onetwo")
>     } finally {
>       Utils.deleteRecursively(dir)
>       sc.stop()
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to