RE: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array
You can use the HiveContext instead of SQLContext, which should support all the HiveQL, including lateral view explode. SQLContext is not supporting that yet. BTW, nice coding format in the email. Yong Date: Tue, 31 Mar 2015 18:18:19 -0400 Subject: Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array From: tsind...@gmail.com To: user@spark.apache.org So in looking at this a bit more, I gather the root cause is the fact that the nested fields are represented as rows within rows, is that correct? If I don't know the size of the json array (it varies), using x.getAs[Row](0).getString(0) is not really a valid solution. Is the solution to apply a lateral view + explode to this? I have attempted to change to a lateral view, but looks like my syntax is off: sqlContext.sql( SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe) .collect.foreach(println(_)) Which results in: 15/03/31 17:38:34 INFO ContextCleaner: Cleaned broadcast 0 Exception in thread main java.lang.RuntimeException: [1.68] failure: ``UNION'' expected but identifier view found SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:174) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:173) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:31) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:83) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:97) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Is this the right approach? Is this syntax available in 1.2.1: SELECT v1.name, v2.city, v2.state FROM people LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 as name, address LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2 as city, state; -Todd On Tue, Mar 31, 2015 at 3:26 PM, Todd Nist tsind...@gmail.com wrote: I am accessing ElasticSearch via
Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array
So in looking at this a bit more, I gather the root cause is the fact that the nested fields are represented as rows within rows, is that correct? If I don't know the size of the json array (it varies), using x.getAs[Row](0).getString(0) is not really a valid solution. Is the solution to apply a lateral view + explode to this? I have attempted to change to a lateral view, but looks like my syntax is off: sqlContext.sql( SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe) .collect.foreach(println(_)) Which results in: 15/03/31 17:38:34 INFO ContextCleaner: Cleaned broadcast 0 Exception in thread main java.lang.RuntimeException: [1.68] failure: ``UNION'' expected but identifier view found SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:174) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:173) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:31) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:83) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:97) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Is this the right approach? Is this syntax available in 1.2.1: SELECT v1.name, v2.city, v2.state FROM people LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 as name, address LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2 as city, state; -Todd On Tue, Mar 31, 2015 at 3:26 PM, Todd Nist tsind...@gmail.com wrote: I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to expose it via SparkSQL. I am using spark 1.2.1, latest supported by elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop % 2.1.0.BUILD-SNAPSHOT of elasticsearch-hadoop. I’m encountering an issue when I attempt to query the following json after creating a temporary table from it. The json looks like this: PUT /_template/device { template: dev*,