Hive on local client gets connection refused error when trying to run a job on ec2 cluster

sumit khanna Fri, 14 Aug 2009 10:27:09 -0700

Hi, 
I have been using the following tutorial to run hive queries on amazon ec2 
hadoop cluster 
http://wiki.apache.org/hadoop/Hive/Hi...


I can ssh into the master node with the following command 
ssh -i /home/sumit/.ec2/id_rsa-gsg-keypair -D 2600 
[email protected]... 

But when i run hive (even as a root user) and set all the parameters as 
mentioned in the tutorial above, hive cant connect to the amzon ec2 cluster and 
i get a connection refused error 

Could someone help me in this 

Thanks a lot 
Sumit


Joydeep mentioned that it might have something to do with my config settings so 
i am attaching my hive-site.xml file too?



here is my error log

FAILED: Unknown exception : java.io.IOException: Call to 
ec2-67-202-7-98.compute-1.amazonaws.com/67.202.7.98:50001 failed on local 
exception: java.net.SocketException: Connection refused
09/08/13 13:41:54 ERROR ql.Driver: FAILED: Unknown exception : 
java.io.IOException: Call to 
ec2-67-202-7-98.compute-1.amazonaws.com/67.202.7.98:50001 failed on local 
exception: java.net.SocketException: Connection refused
java.lang.RuntimeException: java.io.IOException: Call to 
ec2-67-202-7-98.compute-1.amazonaws.com/67.202.7.98:50001 failed on local 
exception: java.net.SocketException: Connection refused
at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:179)
at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:245)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:748)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3643)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:306)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.io.IOException: Call to 
ec2-67-202-7-98.compute-1.amazonaws.com/67.202.7.98:50001 failed on local 
exception: java.net.SocketException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:700)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:176)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:75)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.hive.ql.Context.makeMRScratchDir(Context.java:118)
at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:177)
... 18 more
Caused by: java.net.SocketException: Connection refused
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:405)
at java.net.Socket.connect(Socket.java:519)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:801)
at org.apache.hadoop.ipc.Client.call(Client.java:686)

Would be glad if someone could suggest a solution?

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
<!-- that are implied by Hadoop setup variables.                                                -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource).                                                                                 -->

<!-- Hive Execution Parameters -->
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>true</value>
  <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>5</value>
  <description>Number of retries while opening a connection to metastore</description>
</property>

<property>
  <name>hive.metastore.rawstore.impl</name>
  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
  <description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description>
</property>

<property>
  <name>hive.default.fileformat</name>
  <value>TextFile</value>
  <description>Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
</property>

<property>
  <name>hive.map.aggr</name>
  <value>true</value>
  <description>Whether to use map-side aggregation in Hive Group By queries</description>
</property>

<property>
  <name>hive.groupby.skewindata</name>
  <value>false</value>
  <description>Whether there is skew in data to optimize group by queries</description>
</property>

<property>
  <name>hive.groupby.mapaggr.checkinterval</name>
  <value>100000</value>
  <description>Number of rows after which size of the grouping keys/aggregation classes is performed</description>
</property>

<property>
  <name>hive.mapred.local.mem</name>
  <value>0</value>
  <description>For local mode, memory of the mappers/reducers</description>
</property>

<property>
  <name>hive.map.aggr.hash.percentmemory</name>
  <value>0.5</value>
  <description>Portion of total memory to be used by map-side grup aggregation hash table</description>
</property>

<property>
  <name>hive.optimize.ppd</name>
  <value>false</value>
  <description>Whether to enable predicate pushdown</description>
</property>

<property>
  <name>hive.join.emit.interval</name>
  <value>1000</value>
  <description>How many rows in the right-most join operand Hive should buffer before emitting the join result. </description>
</property>

<property>
  <name>hive.mapred.mode</name>
  <value>nonstrict</value>
  <description>The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run</description>
</property>

<property>
  <name>hive.exec.script.maxerrsize</name>
  <value>100000</value>
  <description>Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity </description>
</property>

<property>
  <name>hive.exec.compress.output</name>
  <value>false</value>
  <description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>false</value>
  <description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>
  <description>This is the host address the Hive Web Interface will listen on</description>
</property>
 
<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
  <description>This is the port the Hive Web Interface will listen on</description>
</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>${HIVE_HOME}/lib/hive-hwi.war</value>
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

<property>
  <name>hive.exec.pre.hooks</name>
  <value></value>
  <description>Pre Execute Hook for Tests</description>
</property>

<property>
  <name>hadoop.socks.server</name>
  <value>localhost:2600</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>hadoop.rpc.socket.factory.class.default</name>
  <value>org.apache.hadoop.net.SocksSocketFactory</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>hadoop.job.ugi</name>
  <value>root,root</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>40</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>-1</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>R4s7ph6NC0S8f0NaDz+iZpolttnWoRUWEpJOmTeW</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>AKIAIEN2Y2QSE7SM7SYQ</value>
  <description>Custom Changes to use Hive with EC2</description>
</property>




</configuration>

Hive on local client gets connection refused error when trying to run a job on ec2 cluster

Reply via email to