[jira] [Comment Edited] (TOREE-318) PySpark Interpreter Prints Are Results

poplav (JIRA) Wed, 01 Jun 2016 15:03:14 -0700

    [ 
https://issues.apache.org/jira/browse/TOREE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304687#comment-15304687
 ]


poplav edited comment on TOREE-318 at 6/1/16 10:01 PM:
-------------------------------------------------------

* [In ExecuteRequestHandler | 
https://github.com/apache/incubator-toree/blob/master/kernel/src/main/scala/org/apache/toree/kernel/protocol/v5/handler/ExecuteRequestHandler.scala#L94]
 we are responding with an execute result if the ExecuteResult from the 
specific interpreter [.hasContent | 
https://github.com/apache/incubator-toree/blob/master/protocol/src/main/scala/org/apache/toree/kernel/protocol/v5/content/ExecuteResult.scala#L28]
* Noted in scala spark that on a print statement it does not have content, but 
on pySpark it does.  In [toree pyspark_runner 
|https://github.com/apache/incubator-toree/blob/master/pyspark-interpreter/src/main/resources/PySpark/pyspark_runner.py#L136]
 when we compile and evaluate we always respond with system output.  The 
pyspark_runner is similar logic as what [zepplin_pyspark | 
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/resources/python/zeppelin_pyspark.py]
 has.  
* These are in contrast to [ipythons executor | 
https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1157]
 which checks if it is and expression and if it's only an expression records 
the output.  The ipython executor uses pythons AST which can check if the 
parsed code is an expression.  Looking at the [AST abstract grammar | 
https://docs.python.org/2/library/ast.html#abstract-grammar] Print falls under 
a statement which is not an expression and in our toree pyspark runner we 
should not be returning the output there (which would follow in line with scala 
spark based off logging the ExecuteRequestHandler that gets invoked by all 
interpreters).  
* Need to figure out how to port over ipython's expression checker logic as it 
has some additional [setups | 
https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1148].
  The kernel tester should work well for this as we can assert for specific 
messages on an execution.
* [Edit] Tried porting over the ast parse logic from ipython, but it would only 
work in a python 2 env.  In 2 print was a statement in 3 it is an expression 
like any other function, so the distinction couldn't be made.  The workaround I 
am trying is get the parse tree from the code, if the last node/code segment of 
the parse tree is an expression modify it to be assigned to a variable and 
execute the modified parse tree.  If that variable is None the print statement 
would get caught in the output stream, but if it the variable is not None 
return that value as an execute result.  Involved adding a sendOutput in 
Brokerstate to flush the output stream that gets passed in and on sendOutput 
matching the code_id to the outputstream to use.


was (Author: poplav):
* [In ExecuteRequestHandler | 
https://github.com/apache/incubator-toree/blob/master/kernel/src/main/scala/org/apache/toree/kernel/protocol/v5/handler/ExecuteRequestHandler.scala#L94]
 we are responding with an execute result if the ExecuteResult from the 
specific interpreter [.hasContent | 
https://github.com/apache/incubator-toree/blob/master/protocol/src/main/scala/org/apache/toree/kernel/protocol/v5/content/ExecuteResult.scala#L28]
* Noted in scala spark that on a print statement it does not have content, but 
on pySpark it does.  In [toree pyspark_runner 
|https://github.com/apache/incubator-toree/blob/master/pyspark-interpreter/src/main/resources/PySpark/pyspark_runner.py#L136]
 when we compile and evaluate we always respond with system output.  The 
pyspark_runner is similar logic as what [zepplin_pyspark | 
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/resources/python/zeppelin_pyspark.py]
 has.  
* These are in contrast to [ipythons executor | 
https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1157]
 which checks if it is and expression and if it's only an expression records 
the output.  The ipython executor uses pythons AST which can check if the 
parsed code is an expression.  Looking at the [AST abstract grammar | 
https://docs.python.org/2/library/ast.html#abstract-grammar] Print falls under 
a statement which is not an expression and in our toree pyspark runner we 
should not be returning the output there (which would follow in line with scala 
spark based off logging the ExecuteRequestHandler that gets invoked by all 
interpreters).  
* Need to figure out how to port over ipython's expression checker logic as it 
has some additional [setups | 
https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1148].
  The kernel tester should work well for this as we can assert for specific 
messages on an execution.

> PySpark Interpreter Prints Are Results
> --------------------------------------
>
>                 Key: TOREE-318
>                 URL: https://issues.apache.org/jira/browse/TOREE-318
>             Project: TOREE
>          Issue Type: Bug
>            Reporter: Corey A Stubbs
>             Fix For: 0.1.0
>
>
> When running any code which outputs a print statement, the statement is sent 
> back to the notebook as an execute result (see http://imgur.com/F0yO1nU). I 
> have only tested this in PySpark, so I assume this could be broken across all 
> interpreters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TOREE-318) PySpark Interpreter Prints Are Results

Reply via email to