[ https://issues.apache.org/jira/browse/SPARK-26911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771628#comment-16771628 ]
Hyukjin Kwon commented on SPARK-26911: -------------------------------------- Can you make the reproducer self-runnable and narrow down the problem? Sounds like requesting investigation than filing an issue. I am resolving this until sufficient information are provided for other people to investigate further. If you have a fix, reopen and make a PR right away. > Spark do not see column in table > -------------------------------- > > Key: SPARK-26911 > URL: https://issues.apache.org/jira/browse/SPARK-26911 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.1 > Environment: PySpark (Spark 2.3.1) > Reporter: Vitaly Larchenkov > Priority: Major > > > > Spark cannot find column that actually exists in array > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input > columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, > flid.bank_id, flid.wr_id, flid.link_id]; {code} > > > {code:java} > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > /usr/share/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > /usr/share/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 327 "An error occurred while calling {0}{1}{2}.\n". > --> 328 format(target_id, ".", name), value) > 329 else: > Py4JJavaError: An error occurred while calling o35.sql. > : org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input > columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, > flid.bank_id, flid.wr_id, flid.link_id]; line 10 pos 98; > 'Project ['multiples.id, 'multiples.link_id] > {code} > > Query: > {code:java} > q = f""" > with flid as ( > select * from flow_log_by_id > ) > select multiples.id, multiples.link_id > from (select fl.id, fl.link_id > from (select id from {flow_log_by_id} group by id having count(*) > 1) > multiples > join {flow_log_by_id} fl on fl.id = multiples.id) multiples > join {level_link} ll > on multiples.link_id = ll.link_id_old and ll.link_id_new in (select link_id > from flid where id = multiples.id) > """ > flow_subset_test_result = spark.sql(q) > {code} > `with flid` used because without it spark do not find `flow_log_by_id` > table, so looks like another issues. In sql it works without problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org