GitHub user Leemoonsoo opened a pull request:
https://github.com/apache/incubator-zeppelin/pull/129
[ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api
There were issue
[ZEPPELIN-97](https://issues.apache.org/jira/browse/ZEPPELIN-97) with pyspark
1.4. The reason is, from pyspark 1.4, java gateway is created with
`auto_convert = True` option. This PR fixes the problem.
This PR also handles
[ZEPPELIN-134](https://issues.apache.org/jira/browse/ZEPPELIN-134), inject
sqlContext.
And it finally improves to print more verbose stacktrace message, for
example
from
```
(<type 'exceptions.AttributeError'>, AttributeError("'list' object has no
attribute '_get_object_id'",), <traceback object at 0x392b638>)
```
to
```
Traceback (most recent call last):
File
"/var/folders/zt/nd4j13y14jjg7_5pc4xgy7t80000gn/T//zeppelin_pyspark.py", line
110, in <module>
eval(compiledCode)
File "<string>", line 3, in <module>
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
line 1200, in withColumn
return self.select('*', col.alias(colName))
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
line 738, in select
jdf = self._jdf.select(self._jcols(*cols))
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
line 630, in _jcols
return self._jseq(cols, _to_java_column)
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
line 617, in _jseq
return _to_seq(self.sql_ctx._sc, cols, converter)
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/column.py",
line 60, in _to_seq
return sc._jvm.PythonUtils.toSeq(cols)
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 529, in __call__
[get_command_part(arg, self.pool) for arg in new_args])
File
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 265, in get_command_part
command_part = REFERENCE_TYPE + parameter._get_object_id()
AttributeError: 'list' object has no attribute '_get_object_id'
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Leemoonsoo/incubator-zeppelin ZEPPELIN-97
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-zeppelin/pull/129.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #129
----
commit bce3c1d33e5ab48146c2d70e81935e361fcff9c2
Author: Lee moon soo <[email protected]>
Date: 2015-06-29T19:53:10Z
Print more stacktrace
commit ab01a665781a9b1399eb000ec480ed1ed4d9b715
Author: Lee moon soo <[email protected]>
Date: 2015-06-29T20:20:36Z
Add testcase for auto_convert option
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---