RE: [SQL] Self join with ArrayType columns problems

Cheng, Hao Tue, 27 Jan 2015 00:57:52 -0800

The root cause for this probably because the identical “exprId” of the 
“AttributeReference” existed while do self-join with “temp table” (temp table = 
resolved logical plan).
I will do the bug fixing and JIRA creation.

Cheng Hao

From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Tuesday, January 27, 2015 12:05 AM
To: Dean Wampler
Cc: Pierre B; user@spark.apache.org; Cheng Hao
Subject: Re: [SQL] Self join with ArrayType columns problems

It seems likely that there is some sort of bug related to the reuse of array 
objects that are returned by UDFs.  Can you open a JIRA?

I'll also note that the sql method on HiveContext does run HiveQL (configured 
by spark.sql.dialect) and the hql method has been deprecated since 1.1 (and 
will probably be removed in 1.3).  The errors are probably because array and 
collect set are hive UDFs and thus not available in a SQLContext.

On Mon, Jan 26, 2015 at 5:44 AM, Dean Wampler 
<deanwamp...@gmail.com<mailto:deanwamp...@gmail.com>> wrote:
You are creating a HiveContext, then using the sql method instead of hql. Is 
that deliberate?

The code doesn't work if you replace HiveContext with SQLContext. Lots of 
exceptions are thrown, but I don't have time to investigate now.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd 
Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe<http://typesafe.com>
@deanwampler<http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Jan 26, 2015 at 7:17 AM, Pierre B 
<pierre.borckm...@realimpactanalytics.com<mailto:pierre.borckm...@realimpactanalytics.com>>
 wrote:
Using Spark 1.2.0, we are facing some weird behaviour when performing self
join on a table with some ArrayType field.
(potential bug ?)

I have set up a minimal non working example here:
https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
<https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
>
In a nutshell, if the ArrayType column used for the pivot is created
manually in the StructType definition, everything works as expected.
However, if the ArrayType pivot column is obtained by a sql query (be it by
using a "array" wrapper, or using a collect_list operator for instance),
then results are completely off.

Could anyone have a look as this really is a blocking issue.

Thanks!

Cheers

P.

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Self-join-with-ArrayType-columns-problems-tp21364.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

RE: [SQL] Self join with ArrayType columns problems

Reply via email to