Jan-Willem van der Sijp created SPARK-25072:
-----------------------------------------------

             Summary: PySpark custom Row class can be given extra parameters
                 Key: SPARK-25072
                 URL: https://issues.apache.org/jira/browse/SPARK-25072
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.2.0
         Environment: {noformat}
SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 3.4.5 (default, Dec 11 2017, 16:57:19)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Python version 3.4.5 (default, Dec 11 2017 16:57:19)
SparkSession available as 'spark'.
{noformat}

{{CentOS release 6.9 (Final)}}
{{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 
12 20:21:04 EST 2017 x86_64 x86_64 x86_64 GNU/Linux}}
{noformat}openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat}
            Reporter: Jan-Willem van der Sijp


When a custom Row class is made in PySpark, it is possible to provide the 
constructor of this class with more parameters than there are columns. These 
extra parameters affect the value of the Row, but are not part of the {{repr}} 
or {{str}} output, making it hard to debug errors due to these "invisible" 
values. The hidden values can be accessed through integer-based indexing though.

Some examples:

{code:python}
In [69]: RowClass = Row("column1", "column2")

In [70]: RowClass(1, 2) == RowClass(1, 2)
Out[70]: True

In [71]: RowClass(1, 2) == RowClass(1, 2, 3)
Out[71]: False

In [75]: RowClass(1, 2, 3)
Out[75]: Row(column1=1, column2=2)

In [76]: RowClass(1, 2)
Out[76]: Row(column1=1, column2=2)

In [77]: RowClass(1, 2, 3).asDict()
Out[77]: {'column1': 1, 'column2': 2}

In [78]: RowClass(1, 2, 3)[2]
Out[78]: 3

In [79]: repr(RowClass(1, 2, 3))
Out[79]: 'Row(column1=1, column2=2)'

In [80]: str(RowClass(1, 2, 3))
Out[80]: 'Row(column1=1, column2=2)'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to