Potential bugs in SparkSQL

2014-07-10 Thread Jerry Lam
Hi Spark developers, I have the following hqls that spark will throw exceptions of this kind: 14/07/10 15:07:55 INFO TaskSetManager: Loss was due to org.apache.spark.TaskKilledException [duplicate 17] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:736 failed 4 times,

Re: Potential bugs in SparkSQL

2014-07-10 Thread Stephen Boesch
Hi Jerry, To add to your question: Following does work (from master)- notice the registerAsTable is commented : (I took a liberty to add the order by clause) val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) import hiveContext._ hql(USE test) // hql(select id from

Re: Potential bugs in SparkSQL

2014-07-10 Thread Michael Armbrust
Hi Jerry, Thanks for reporting this. It would be helpful if you could provide the output of the following command: println(hql(select s.id from m join s on (s.id=m_id)).queryExecution) Michael On Thu, Jul 10, 2014 at 8:15 AM, Jerry Lam chiling...@gmail.com wrote: Hi Spark developers, I

Re: Potential bugs in SparkSQL

2014-07-10 Thread Jerry Lam
Hi Michael, I got the log you asked for. Note that I manually edited the table name and the field names to hide some sensitive information. == Logical Plan == Project ['s.id] Join Inner, Some((id#106 = 'm.id)) Project [id#96 AS id#62] MetastoreRelation test, m, None MetastoreRelation

Re: Potential bugs in SparkSQL

2014-07-10 Thread Michael Armbrust
Hmm, yeah looks like the table name is not getting applied to the attributes of m. You can work around this by rewriting your query as: hql(select s.id from (SELECT * FROM m) m join s on (s.id=m.id) order by s.id This explicitly gives the alias m to the attributes of that table. You can also