Yin Huai created SPARK-8573:
-------------------------------
Summary: For PySpark's DataFrame API, we need to throw exceptions
when users try to use and/or/not
Key: SPARK-8573
URL: https://issues.apache.org/jira/browse/SPARK-8573
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 1.3.0
Reporter: Yin Huai
Assignee: Davies Liu
Priority: Critical
In PySpark's DataFrame API, we have
{code}
# `and`, `or`, `not` cannot be overloaded in Python,
# so use bitwise operators as boolean operators
__and__ = _bin_op('and')
__or__ = _bin_op('or')
__invert__ = _func_op('not')
__rand__ = _bin_op("and")
__ror__ = _bin_op("or")
{code}
Right now, users can still use operators like {{and}}, which can cause very
confusing behaviors. We need to throw an error when users try to use them and
let them know what is the right way to do.
For example,
{code}
df = sqlContext.range(1, 10)
df.id > 5 or df.id < 10
Out[30]: Column<(id > 5)>
df.id > 5 and df.id < 10
Out[31]: Column<(id < 10)>
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]