Thank you Harsh. That clears my doubt for Hadoop with S3. Q. Does HBase communicate with S3 directly without using Hadoop?
I've put this task aside for a while..! ..will post again. I've not make it working yet. "jets3t jar" is present in classpath. Thanks, Alok HMaster is running .. Regionserver log : 2012-08-03 12:42:40,576 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fhbase%2F.logs%2Fslave-1%2C60020%2C1343977957962' - Unexpected response code 404, expected 200 2012-08-03 12:42:40,576 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fhbase%2F.logs%2Fslave-1%2C60020%2C1343977957962' - Received error response with XML message 2012-08-03 12:42:43,063 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fhbase%2F.logs%2Fslave-1%2C60020%2C1343977957962' - Unexpected response code 404, expected 200 2012-08-03 12:42:43,063 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fhbase%2F.logs%2Fslave-1%2C60020%2C1343977957962' - Received error response with XML message 2012-08-03 12:42:43,831 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: HLog configuration: blocksize=32 MB, rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms 2012-08-03 12:42:43,840 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed initialization 2012-08-03 12:42:43,842 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init java.io.IOException: cannot get log writer at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:678) at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:625) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:557) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:517) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:405) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:331) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1215) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1204) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:923) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:639) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: createNonRecursive unsupported for this filesystem class org.apache.hadoop.fs.s3.S3FileSystem at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106) at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:675) ... 10 more Caused by: java.io.IOException: createNonRecursive unsupported for this filesystem class org.apache.hadoop.fs.s3.S3FileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:626) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87) ... 11 more 2012-08-03 12:42:43,847 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server slave-1,60020,1343977957962: Unhandled exception: cannot get log writer java.io.IOException: cannot get log writer at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:678) at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:625) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:557) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:517) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:405) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:331) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1215) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1204) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:923) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:639) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: createNonRecursive unsupported for this filesystem class org.apache.hadoop.fs.s3.S3FileSystem at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106) at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:675) ... 10 more Caused by: java.io.IOException: createNonRecursive unsupported for this filesystem class org.apache.hadoop.fs.s3.S3FileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:626) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87) ... 11 more 2012-08-03 12:42:43,848 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-08-03 12:42:43,850 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: cannot get log writer 2012-08-03 12:42:43,850 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 2012-08-03 12:42:43,854 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server slave-1,60020,1343977957962 2012-08-03 12:42:43,856 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x138eb473ef00006 2012-08-03 12:42:43,920 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-08-03 12:42:43,921 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138eb473ef00006 closed 2012-08-03 12:42:43,921 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server slave-1,60020,1343977957962; all regions closed. 2012-08-03 12:42:44,021 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing leases 2012-08-03 12:42:44,022 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed leases 2012-08-03 12:42:44,045 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138eb473ef00005 closed 2012-08-03 12:42:44,045 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-08-03 12:42:44,046 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server slave-1,60020,1343977957962; zookeeper connection closed. 2012-08-03 12:42:44,048 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting 2012-08-03 12:42:44,050 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-4,5,main] 2012-08-03 12:42:44,050 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook 2012-08-03 12:42:44,051 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread. 2012-08-03 12:42:44,051 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished. ---------------- On Thu, Aug 2, 2012 at 11:18 PM, Harsh J <ha...@cloudera.com> wrote: > Alok, > > HDFS is a FileSystem. S3 is also a FileSystem. Hence when you choose > to use S3 on a node, do not attempt to start HDFS services such as > NameNode and DataNode. They have nothing to do with S3. S3 stands > alone and its configuration points to where it is running / how it is > to be accessed / etc.. For S3 to be available, the S3's jars should be > made available in services you wish to use it in. > > Yes you can make Hive/HBase work with S3, if S3 is configured as the > fs.default.name (or fs.defaultFS in 2.x+). You can configure your > core-site.xml with the right FS, and run regular "hadoop fs -ls /", > etc. commands against that FS. The library is jets3t: > http://jets3t.s3.amazonaws.com/downloads.html and you'll need its jar > on HBase/Hive/etc. classpaths. > > Let us know if this clears it up! > > On Thu, Aug 2, 2012 at 6:31 PM, Alok Kumar <alok...@gmail.com> wrote: > > Hi, > > > > Thank you for reply. > > > > Requirement is that I need to setup a hadoop cluster using s3 as a backup > > (performance won't be an issue) > > > > My Architecture is like : > > Hive has external table mapped to HBase. HBase is storing data to HDFS. > > Hive is using Hadoop to access HBase table data. > > Can I make this work using S3? > > > > HBase regionserver is failing with Error "Caused by: > > java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException" > > > > HBase master log has lots of "Unexpected response code 404, expected > 200" > > > > Do I need to start DataNode with s3? > > Datanode log says : > > > > 2012-08-02 17:50:20,021 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: > > /************************************************************ > > STARTUP_MSG: Starting DataNode > > STARTUP_MSG: host = desktop/192.168.2.4 > > STARTUP_MSG: args = [] > > STARTUP_MSG: version = 1.0.1 > > STARTUP_MSG: build = > > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r > > 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012 > > ************************************************************/ > > 2012-08-02 17:50:20,145 INFO > org.apache.hadoop.metrics2.impl.MetricsConfig: > > loaded properties from hadoop-metrics2.properties > > 2012-08-02 17:50:20,156 INFO > > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > > MetricsSystem,sub=Stats registered. > > 2012-08-02 17:50:20,157 INFO > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot > period > > at 10 second(s). > > 2012-08-02 17:50:20,157 INFO > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics > system > > started > > 2012-08-02 17:50:20,277 INFO > > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > ugi > > registered. > > 2012-08-02 17:50:20,281 WARN > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi > already > > exists! > > 2012-08-02 17:50:20,317 INFO org.apache.hadoop.util.NativeCodeLoader: > Loaded > > the native-hadoop library > > 2012-08-02 17:50:22,006 ERROR > > org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: > Call > > to <bucket-name>/67.215.65.132:8020 failed on local exception: > > java.io.EOFException > > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103) > > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > > at $Proxy5.getProtocolVersion(Unknown Source) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429) > > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331) > > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356) > > at > > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) > > at > > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > at > > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745) > > > > 2012-08-02 17:50:22,007 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down DataNode at desktop/192.168.2.4 > > > > > > Thanks, > > > > > > On Thu, Aug 2, 2012 at 5:22 PM, Harsh J <ha...@cloudera.com> wrote: > >> > >> With S3 you do not need a NameNode. NameNode is part of HDFS. > >> > >> On Thu, Aug 2, 2012 at 12:44 PM, Alok Kumar <alok...@gmail.com> wrote: > >> > Hi, > >> > > >> > Followed instructions from this link for setup > >> > http://wiki.apache.org/hadoop/AmazonS3. > >> > > >> > my "core-site.xml " contains only these 3 properties : > >> > <property> > >> > <name>fs.default.name</name> > >> > <value>s3://BUCKET</value> > >> > </property> > >> > > >> > <property> > >> > <name>fs.s3.awsAccessKeyId</name> > >> > <value>ID</value> > >> > </property> > >> > > >> > <property> > >> > <name>fs.s3.awsSecretAccessKey</name> > >> > <value>SECRET</value> > >> > </property> > >> > > >> > hdfs-site.xml is empty! > >> > > >> > Namenode log says, its trying to connect to local HDFS not S3. > >> > Am i missing anything? > >> > > >> > Regards, > >> > Alok > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > > > -- > > Alok > > > > > > > > -- > Harsh J > -- Alok Kumar