GitHub user Leemoonsoo opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/129

    [ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api

    There were issue 
[ZEPPELIN-97](https://issues.apache.org/jira/browse/ZEPPELIN-97) with pyspark 
1.4. The reason is, from pyspark 1.4, java gateway is created with 
`auto_convert = True` option. This PR fixes the problem.
    
    This PR also handles 
[ZEPPELIN-134](https://issues.apache.org/jira/browse/ZEPPELIN-134), inject 
sqlContext.
    
    And it finally improves to print more verbose stacktrace message, for 
example
    
    from
    
    ```
    (<type 'exceptions.AttributeError'>, AttributeError("'list' object has no 
attribute '_get_object_id'",), <traceback object at 0x392b638>)
    ```
    
    to
    
    ```
    Traceback (most recent call last):
      File 
"/var/folders/zt/nd4j13y14jjg7_5pc4xgy7t80000gn/T//zeppelin_pyspark.py", line 
110, in <module>
        eval(compiledCode)
      File "<string>", line 3, in <module>
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
 line 1200, in withColumn
        return self.select('*', col.alias(colName))
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
 line 738, in select
        jdf = self._jdf.select(self._jcols(*cols))
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
 line 630, in _jcols
        return self._jseq(cols, _to_java_column)
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py",
 line 617, in _jseq
        return _to_seq(self.sql_ctx._sc, cols, converter)
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/column.py",
 line 60, in _to_seq
        return sc._jvm.PythonUtils.toSeq(cols)
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
 line 529, in __call__
        [get_command_part(arg, self.pool) for arg in new_args])
      File 
"/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
 line 265, in get_command_part
        command_part = REFERENCE_TYPE + parameter._get_object_id()
    AttributeError: 'list' object has no attribute '_get_object_id'
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Leemoonsoo/incubator-zeppelin ZEPPELIN-97

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #129
    
----
commit bce3c1d33e5ab48146c2d70e81935e361fcff9c2
Author: Lee moon soo <[email protected]>
Date:   2015-06-29T19:53:10Z

    Print more stacktrace

commit ab01a665781a9b1399eb000ec480ed1ed4d9b715
Author: Lee moon soo <[email protected]>
Date:   2015-06-29T20:20:36Z

    Add testcase for auto_convert option

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to