[jira] [Commented] (SPARK-25072) PySpark custom Row class can be given extra parameters

Hyukjin Kwon (JIRA) Fri, 10 Aug 2018 02:03:21 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575980#comment-16575980
 ]


Hyukjin Kwon commented on SPARK-25072:
--------------------------------------

>From a cursory look, we should add a check in

{code}
def _create_row(fields, values):
    row = Row(*values)
    row.__fields__ = fields
    return row
{code}

at {{type.spy}} although we should double check if that's going to break 
something else.

> PySpark custom Row class can be given extra parameters
> ------------------------------------------------------
>
>                 Key: SPARK-25072
>                 URL: https://issues.apache.org/jira/browse/SPARK-25072
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>         Environment: {noformat}
> SPARK_MAJOR_VERSION is set to 2, using Spark2
> Python 3.4.5 (default, Dec 11 2017, 16:57:19)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> 18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
>       /_/
> Using Python version 3.4.5 (default, Dec 11 2017 16:57:19)
> SparkSession available as 'spark'.
> {noformat}
> {{CentOS release 6.9 (Final)}}
> {{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 
> 12 20:21:04 EST 2017 x86_64 x86_64 x86_64 GNU/Linux}}
> {noformat}openjdk version "1.8.0_161"
> OpenJDK Runtime Environment (build 1.8.0_161-b14)
> OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat}
>            Reporter: Jan-Willem van der Sijp
>            Priority: Minor
>
> When a custom Row class is made in PySpark, it is possible to provide the 
> constructor of this class with more parameters than there are columns. These 
> extra parameters affect the value of the Row, but are not part of the 
> {{repr}} or {{str}} output, making it hard to debug errors due to these 
> "invisible" values. The hidden values can be accessed through integer-based 
> indexing though.
> Some examples:
> {code:python}
> In [69]: RowClass = Row("column1", "column2")
> In [70]: RowClass(1, 2) == RowClass(1, 2)
> Out[70]: True
> In [71]: RowClass(1, 2) == RowClass(1, 2, 3)
> Out[71]: False
> In [75]: RowClass(1, 2, 3)
> Out[75]: Row(column1=1, column2=2)
> In [76]: RowClass(1, 2)
> Out[76]: Row(column1=1, column2=2)
> In [77]: RowClass(1, 2, 3).asDict()
> Out[77]: {'column1': 1, 'column2': 2}
> In [78]: RowClass(1, 2, 3)[2]
> Out[78]: 3
> In [79]: repr(RowClass(1, 2, 3))
> Out[79]: 'Row(column1=1, column2=2)'
> In [80]: str(RowClass(1, 2, 3))
> Out[80]: 'Row(column1=1, column2=2)'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25072) PySpark custom Row class can be given extra parameters

Reply via email to