[ https://issues.apache.org/jira/browse/SPARK-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575980#comment-16575980 ]
Hyukjin Kwon commented on SPARK-25072: -------------------------------------- >From a cursory look, we should add a check in {code} def _create_row(fields, values): row = Row(*values) row.__fields__ = fields return row {code} at {{type.spy}} although we should double check if that's going to break something else. > PySpark custom Row class can be given extra parameters > ------------------------------------------------------ > > Key: SPARK-25072 > URL: https://issues.apache.org/jira/browse/SPARK-25072 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.2.0 > Environment: {noformat} > SPARK_MAJOR_VERSION is set to 2, using Spark2 > Python 3.4.5 (default, Dec 11 2017, 16:57:19) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > 18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 2.2.0 > /_/ > Using Python version 3.4.5 (default, Dec 11 2017 16:57:19) > SparkSession available as 'spark'. > {noformat} > {{CentOS release 6.9 (Final)}} > {{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov > 12 20:21:04 EST 2017 x86_64 x86_64 x86_64 GNU/Linux}} > {noformat}openjdk version "1.8.0_161" > OpenJDK Runtime Environment (build 1.8.0_161-b14) > OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat} > Reporter: Jan-Willem van der Sijp > Priority: Minor > > When a custom Row class is made in PySpark, it is possible to provide the > constructor of this class with more parameters than there are columns. These > extra parameters affect the value of the Row, but are not part of the > {{repr}} or {{str}} output, making it hard to debug errors due to these > "invisible" values. The hidden values can be accessed through integer-based > indexing though. > Some examples: > {code:python} > In [69]: RowClass = Row("column1", "column2") > In [70]: RowClass(1, 2) == RowClass(1, 2) > Out[70]: True > In [71]: RowClass(1, 2) == RowClass(1, 2, 3) > Out[71]: False > In [75]: RowClass(1, 2, 3) > Out[75]: Row(column1=1, column2=2) > In [76]: RowClass(1, 2) > Out[76]: Row(column1=1, column2=2) > In [77]: RowClass(1, 2, 3).asDict() > Out[77]: {'column1': 1, 'column2': 2} > In [78]: RowClass(1, 2, 3)[2] > Out[78]: 3 > In [79]: repr(RowClass(1, 2, 3)) > Out[79]: 'Row(column1=1, column2=2)' > In [80]: str(RowClass(1, 2, 3)) > Out[80]: 'Row(column1=1, column2=2)' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org