Daehan Kim created SPARK-3035:
---------------------------------

             Summary: Wrong example with SparkContext.addFile
                 Key: SPARK-3035
                 URL: https://issues.apache.org/jira/browse/SPARK-3035
             Project: Spark
          Issue Type: Documentation
          Components: PySpark
    Affects Versions: 1.0.2
            Reporter: Daehan Kim
            Priority: Trivial
             Fix For: 1.0.2


{code:title="context.py"}
def addFile(self, path):
    """
    ...
    >>> from pyspark import SparkFiles
    >>> path = os.path.join(tempdir, "test.txt")
    >>> with open(path, "w") as testFile:
    ...    testFile.write("100")
    >>> sc.addFile(path)
    >>> def func(iterator):
    ...    with open(SparkFiles.get("test.txt")) as testFile:
    ...        fileVal = int(testFile.readline())
    ...        return [x * 100 for x in iterator]
    >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect()
    [100, 200, 300, 400]
    """
{code}

This is example that write 100 to temp file and distribute it and use it's 
value when multiplying values(to see if nodes can read distributed file)

But look this lines, result will never be effected by distributed file:
{code}
    ...        fileVal = int(testFile.readline())
    ...        return [x * 100 for x in iterator]
{code}

I'm sure this code was intended as like this: 
{code}
    ...        fileVal = int(testFile.readline())
    ...        return [x * fileVal for x in iterator]
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to