Alok, HDFS is a FileSystem. S3 is also a FileSystem. Hence when you choose to use S3 on a node, do not attempt to start HDFS services such as NameNode and DataNode. They have nothing to do with S3. S3 stands alone and its configuration points to where it is running / how it is to be accessed / etc.. For S3 to be available, the S3's jars should be made available in services you wish to use it in.
Yes you can make Hive/HBase work with S3, if S3 is configured as the fs.default.name (or fs.defaultFS in 2.x+). You can configure your core-site.xml with the right FS, and run regular "hadoop fs -ls /", etc. commands against that FS. The library is jets3t: http://jets3t.s3.amazonaws.com/downloads.html and you'll need its jar on HBase/Hive/etc. classpaths. Let us know if this clears it up! On Thu, Aug 2, 2012 at 6:31 PM, Alok Kumar <alok...@gmail.com> wrote: > Hi, > > Thank you for reply. > > Requirement is that I need to setup a hadoop cluster using s3 as a backup > (performance won't be an issue) > > My Architecture is like : > Hive has external table mapped to HBase. HBase is storing data to HDFS. > Hive is using Hadoop to access HBase table data. > Can I make this work using S3? > > HBase regionserver is failing with Error "Caused by: > java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException" > > HBase master log has lots of "Unexpected response code 404, expected 200" > > Do I need to start DataNode with s3? > Datanode log says : > > 2012-08-02 17:50:20,021 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = datarpm-desktop/192.168.2.4 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 1.0.1 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r > 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012 > ************************************************************/ > 2012-08-02 17:50:20,145 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2012-08-02 17:50:20,156 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > MetricsSystem,sub=Stats registered. > 2012-08-02 17:50:20,157 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2012-08-02 17:50:20,157 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system > started > 2012-08-02 17:50:20,277 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi > registered. > 2012-08-02 17:50:20,281 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already > exists! > 2012-08-02 17:50:20,317 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded > the native-hadoop library > 2012-08-02 17:50:22,006 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call > to <bucket-name>/67.215.65.132:8020 failed on local exception: > java.io.EOFException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103) > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy5.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429) > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331) > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745) > > 2012-08-02 17:50:22,007 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down DataNode at datarpm-desktop/192.168.2.4 > > > Thanks, > > > On Thu, Aug 2, 2012 at 5:22 PM, Harsh J <ha...@cloudera.com> wrote: >> >> With S3 you do not need a NameNode. NameNode is part of HDFS. >> >> On Thu, Aug 2, 2012 at 12:44 PM, Alok Kumar <alok...@gmail.com> wrote: >> > Hi, >> > >> > Followed instructions from this link for setup >> > http://wiki.apache.org/hadoop/AmazonS3. >> > >> > my "core-site.xml " contains only these 3 properties : >> > <property> >> > <name>fs.default.name</name> >> > <value>s3://BUCKET</value> >> > </property> >> > >> > <property> >> > <name>fs.s3.awsAccessKeyId</name> >> > <value>ID</value> >> > </property> >> > >> > <property> >> > <name>fs.s3.awsSecretAccessKey</name> >> > <value>SECRET</value> >> > </property> >> > >> > hdfs-site.xml is empty! >> > >> > Namenode log says, its trying to connect to local HDFS not S3. >> > Am i missing anything? >> > >> > Regards, >> > Alok >> >> >> >> -- >> Harsh J > > > > > -- > Alok > > -- Harsh J