Hey Costin, Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed to get everything working locally using the latest version of everything (though I realized just now I'm still on hive 0.9). I'm guessing you're right about some port connection issue because I definitely had ES running on that machine.
I changed hive-log4j.properties and added #custom logging levels #log4j.logger.xxx=DEBUG log4j.logger.org.elasticsearch.hadoop.rest=TRACE log4j.logger.org.elasticsearch.hadoop.mr=TRACE But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue, but, for the future, is this the correct way to set TRACE logging? Oh and, for reference, I tried running without ES up and I got the following, exceptions: 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive Internal Error: java.lang.IllegalStateException(Cannot discover Elasticsearch version) java.lang.IllegalStateException: Cannot discover Elasticsearch version at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101) at org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at shark.SharkCliDriver$.main(SharkCliDriver.scala:232) at shark.SharkCliDriver.main(SharkCliDriver.scala) Caused by: java.io.IOException: Out of nodes and retries; caught exception at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209) at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103) at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274) at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84) at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99) ... 18 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.<init>(Socket.java:425) at java.net.Socket.<init>(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160) at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74) ... 25 more Let me know if there's anything in particular you'd like me to try on EC2. (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0) Thanks again, Max On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote: > > The error indicates a network error - namely es-hadoop cannot connect to > Elasticsearch on the default (localhost:9200) > HTTP port. Can you double check whether that's indeed the case (using curl > or even telnet on that port) - maybe the > firewall prevents any connections to be made... > Also you could try using the latest Hive, 0.12 and a more recent Hadoop > such as 1.1.2 or 1.2.1. > > Additionally, can you enable TRACE logging in your job on es-hadoop > packages org.elasticsearch.hadoop.rest and > org.elasticsearch.hadoop.mr packages and report back ? > > Thanks, > > On 19/02/2014 4:03 AM, Max Lang wrote: > > I set everything up using this guide: > https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2 > cluster. I've > > copied the elasticsearch-hadoop jars into the hive lib directory and I > have elasticsearch running on localhost:9200. I'm > > running shark in a screen session with --service screenserver and > connecting to it at the same time using shark -h > > localhost. > > > > Unfortunately, when I attempt to write data into elasticsearch, it > fails. Here's an example: > > > > | > > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title > STRING,last_modified STRING,xml STRING,text > > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION > 's3n://spark-data/wikipedia-sample/'; > > Timetaken (including network latency):0.159seconds > > 14/02/1901:23:33INFO CliDriver:Timetaken (including network > latency):0.159seconds > > > > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1; > > Alpokalja > > Timetaken (including network latency):2.23seconds > > 14/02/1901:23:48INFO CliDriver:Timetaken (including network > latency):2.23seconds > > > > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title > STRING,last_modified STRING,xml STRING,text > > STRING)STORED BY > 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article'); > > > > Timetaken (including network latency):0.061seconds > > 14/02/1901:33:51INFO CliDriver:Timetaken (including network > latency):0.061seconds > > > > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECT > > w.id,w.title,w.last_modified,w.xml,w.text > FROM wiki w; > > [HiveError]:Queryreturned non-zero > code:9,cause:FAILED:ExecutionError,returncode > -101fromshark.execution.SparkTask > > Timetaken (including network latency):3.575seconds > > 14/02/1901:34:42INFO CliDriver:Timetaken (including network > latency):3.575seconds > > | > > > > *The stack trace looks like this:* > > > > org.apache.hadoop.hive.ql.metadata.HiveException > (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > > Out of nodes and retries; caught exception) > > > > > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.deploy.Sp > > > arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744 > > > > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop > 1.0.4, and java 1.7.0_51 > > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, > it looks like hive is just rethrowing an > > IOException it's getting from Spark, and elasticsearch-hadoop is just > hitting those exceptions. > > I suppose my questions are: Does this look like an issue with my > ES/elasticsearch-hadoop config? And has anyone gotten > > elasticsearch working with Spark/Shark? > > Any ideas/insights are appreciated. > > Thanks,Max > > > > -- > > You received this message because you are subscribed to the Google > Groups "elasticsearch" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to > > [email protected] <javascript:>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com. > > > > For more options, visit https://groups.google.com/groups/opt_out. > > -- > Costin > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
