[
https://issues.apache.org/jira/browse/SYSTEMML-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Dusenberry reassigned SYSTEMML-650:
----------------------------------------
Assignee: Mike Dusenberry
> Error while trying to load data as a DataFrame in PySpark
> ---------------------------------------------------------
>
> Key: SYSTEMML-650
> URL: https://issues.apache.org/jira/browse/SYSTEMML-650
> Project: SystemML
> Issue Type: Bug
> Affects Versions: SystemML 0.9
> Environment: Cloudera Distribution CDH 5.5.0
> Hadoop 2.6.0
> Spark 1.5.0
> SystemML 0.9.0
> Python 2.7.6
> Reporter: Kartik Kannapur
> Assignee: Mike Dusenberry
> Labels: documentation, newbie
> Fix For: SystemML 0.10
>
>
> I tried to run the sample code for "Jupyter (PySpark) Notebook Example -
> Poisson Nonnegative Matrix Factorization" as provided in the documentation.
> The code fails at the line where we try to run the PNMF script on SystemML
> with Spark:
> {code:xml}
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10},
> ["W", "H", "losses"])
> {code}
> The script seems to fail at the first line itself, where *X_train* is passed
> as a DataFrame into the variable *X*.
> The error message is as below:
> {code:xml}
> /tmp/spark-e7974be5-4438-44b2-ae83-574b2c2bad21/userFiles-5a3c99c5-9bb7-46fe-af83-5119f9358e0f/SystemML.py
> in executeScript(self, dmlScript, nargs, outputs, configFilePath)
> 126
> 127 # Execute script
> --> 128 jml_out = self.ml.executeScript(dmlScript, nargs,
> configFilePath)
> 129 ml_out = MLOutput(jml_out, self.sc)
> 130 return ml_out
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
> in __call__(self, *args)
> 536 answer = self.gateway_client.send_command(command)
> 537 return_value = get_return_value(answer, self.gateway_client,
> --> 538 self.target_id, self.name)
> 539
> 540 for temp_arg in temp_args:
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/pyspark/sql/utils.pyc
> in deco(*a, **kw)
> 34 def deco(*a, **kw):
> 35 try:
> ---> 36 return f(*a, **kw)
> 37 except py4j.protocol.Py4JJavaError as e:
> 38 s = e.java_exception.toString()
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
> in get_return_value(answer, gateway_client, target_id, name)
> 302 raise Py4JError(
> 303 'An error occurred while calling {0}{1}{2}.
> Trace:\n{3}\n'.
> --> 304 format(target_id, '.', name, value))
> 305 else:
> 306 raise Py4JError(
> Py4JError: An error occurred while calling o79.executeScript. Trace:
> py4j.Py4JException: Method executeScript([class java.lang.String, class
> java.util.HashMap, null]) does not exist
> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
> at py4j.Gateway.invoke(Gateway.java:252)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Is there any workaround for this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)