Pavel Laskov created SPARK-6282:
-----------------------------------

             Summary: Strange Python import error when using random() in a 
lambda function
                 Key: SPARK-6282
                 URL: https://issues.apache.org/jira/browse/SPARK-6282
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: Kubuntu 14.04, Python 2.7.6
            Reporter: Pavel Laskov
            Priority: Minor


Consider the exemplary Python code below:

from random import random
from pyspark.context import SparkContext
from xval_mllib import read_csv_file_as_list

if __name__ == "__main__": 
    sc = SparkContext(appName="Random() bug test")
    data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
    #data = sc.parallelize([1, 2, 3, 4, 5], 2)
    d = data.map(lambda x: (random(), x))
    print d.first()

Data is read from a large CSV file. Running this code results in a Python 
import error:

ImportError: No module named _winreg

If I use 'import random' and 'random.random()' in the lambda function no error 
occurs. Also no error occurs, for both kinds of import statements, for a small 
artificial data set like the one shown in a commented line.  

The full error trace, the source code of csv reading code (function 
'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
dataset is about 8M large) can be provided. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to