[ https://issues.apache.org/jira/browse/SPARK-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia updated SPARK-1917: --------------------------------- Assignee: Uri Laserson > PySpark fails to import functions from {{scipy.special}} > -------------------------------------------------------- > > Key: SPARK-1917 > URL: https://issues.apache.org/jira/browse/SPARK-1917 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 0.9.0, 1.0.0 > Reporter: Uri Laserson > Assignee: Uri Laserson > Fix For: 0.9.2, 1.0.1 > > Original Estimate: 1h > Remaining Estimate: 1h > > PySpark is able to load {{numpy}} functions, but not {{scipy.special}} > functions. For example take this snippet: > {code} > from numpy import exp > from scipy.special import gammaln > a = range(1, 11) > b = sc.parallelize(a) > c = b.map(exp) > d = b.map(special.gammaln) > {code} > Calling {{c.collect()}} will return the expected result. However, calling > {{d.collect()}} will fail with > {code} > KeyError: (('gammaln',), <function _getobject at 0x10c0879b0>, > ('scipy.special', 'gammaln')) > {code} > in {{cloudpickle.py}} module in {{_getobject}}. > The reason is that {{_getobject}} executes {{__import__(modname)}}, which > only loads the top-level package {{X}} in case {{modname}} is like {{X.Y}}. > It is failing because {{gammaln}} is not a member of {{scipy}}. The fix (for > which I will shortly submit a PR) is to add {{fromlist=[attribute]}} to the > {{__import__}} call, which will load the innermost module. > See > [https://docs.python.org/2/library/functions.html#__import__] > and > [http://stackoverflow.com/questions/9544331/from-a-b-import-x-using-import] -- This message was sent by Atlassian JIRA (v6.2#6252)