Re: Spark SQL 1.3 not finding attribute in DF

2015-12-07 Thread Davies Liu
Could you reproduce this problem in 1.5 or 1.6?

On Sun, Dec 6, 2015 at 12:29 AM, YaoPau  wrote:
> If anyone runs into the same issue, I found a workaround:
>
 df.where('state_code = "NY"')
>
> works for me.
>
 df.where(df.state_code == "NY").collect()
>
> fails with the error from the first post.
>
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-3-not-finding-attribute-in-DF-tp25599p25600.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL 1.3 not finding attribute in DF

2015-12-07 Thread Jon Gregg
I'm working with a Hadoop distribution that doesn't support 1.5 yet, we'll
be able to upgrade in probably two months.  For now I'm seeing the same
issue with spark not recognizing an existing column name in many
hive-table-to-dataframe situations:

Py4JJavaError: An error occurred while calling o375.filter.
: org.apache.spark.sql.AnalysisException: resolved attributes *state_code*
missing from
latitude,country_code,tim_zone_desc,longitude,dma_durable_key,submarket,dma_
code,dma_desc,county,city,zip_code,*state_code*;

On Mon, Dec 7, 2015 at 3:52 PM, Davies Liu  wrote:

> Could you reproduce this problem in 1.5 or 1.6?
>
> On Sun, Dec 6, 2015 at 12:29 AM, YaoPau  wrote:
> > If anyone runs into the same issue, I found a workaround:
> >
>  df.where('state_code = "NY"')
> >
> > works for me.
> >
>  df.where(df.state_code == "NY").collect()
> >
> > fails with the error from the first post.
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-3-not-finding-attribute-in-DF-tp25599p25600.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Spark SQL 1.3 not finding attribute in DF

2015-12-06 Thread YaoPau
When I run df.printSchema() I get:

root
 |-- durable_key: string (nullable = true)
 |-- code: string (nullable = true)
 |-- desc: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state_code: string (nullable = true)
 |-- zip_code: string (nullable = true)
 |-- county: string (nullable = true)
 |-- country_code: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- market: string (nullable = true)
 |-- tim_zone_desc: string (nullable = true)

A series-like statement by itself seems to work:

>>> df.state_code == "NY"
Column<(state_code = NY)>

But all of these return the error below:

>>> df.where(df.state_code == "NY")
>>> df[df.state_code == "NY"]
>>> df.filter(df['state_code'] == "NY")

I'm guessing this is a bug.  Is there a workaround in 1.3?


Py4JJavaError Traceback (most recent call last)
 in ()
> 1 df.filter(df['state_code'] == "NY")

/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/dataframe.py in
filter(self, condition)
627 jdf = self._jdf.filter(condition)
628 elif isinstance(condition, Column):
--> 629 jdf = self._jdf.filter(condition._jc)
630 else:
631 raise TypeError("condition should be string or Column")

/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
in __call__(self, *args)
536 answer = self.gateway_client.send_command(command)
537 return_value = get_return_value(answer, self.gateway_client,
--> 538 self.target_id, self.name)
539 
540 for temp_arg in temp_args:

/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
in get_return_value(answer, gateway_client, target_id, name)
298 raise Py4JJavaError(
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(

Py4JJavaError: An error occurred while calling o375.filter.
: org.apache.spark.sql.AnalysisException: resolved attributes state_code
missing from
latitude,country_code,tim_zone_desc,longitude,dma_durable_key,submarket,dma_code,dma_desc,county,city,zip_code,state_code;
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:93)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:43)
at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:88)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.apply(CheckAnalysis.scala:43)
at
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
at
org.apache.spark.sql.DataFrame.logicalPlanToDataFrame(DataFrame.scala:157)
at org.apache.spark.sql.DataFrame.filter(DataFrame.scala:508)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-3-not-finding-attribute-in-DF-tp25599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL 1.3 not finding attribute in DF

2015-12-06 Thread YaoPau
If anyone runs into the same issue, I found a workaround:

>>> df.where('state_code = "NY"')

works for me.

>>> df.where(df.state_code == "NY").collect()

fails with the error from the first post.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-3-not-finding-attribute-in-DF-tp25599p25600.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org