Re: Hadoop with S3 instead of local storage

Harsh J Thu, 02 Aug 2012 10:49:48 -0700

Alok,

HDFS is a FileSystem. S3 is also a FileSystem. Hence when you choose
to use S3 on a node, do not attempt to start HDFS services such as
NameNode and DataNode. They have nothing to do with S3. S3 stands
alone and its configuration points to where it is running / how it is
to be accessed / etc.. For S3 to be available, the S3's jars should be
made available in services you wish to use it in.


Yes you can make Hive/HBase work with S3, if S3 is configured as the
fs.default.name (or fs.defaultFS in 2.x+). You can configure your
core-site.xml with the right FS, and run regular "hadoop fs -ls /",
etc. commands against that FS. The library is jets3t:
http://jets3t.s3.amazonaws.com/downloads.html and you'll need its jar
on HBase/Hive/etc. classpaths.

Let us know if this clears it up!

On Thu, Aug 2, 2012 at 6:31 PM, Alok Kumar <alok...@gmail.com> wrote:
> Hi,
>
> Thank you for reply.
>
> Requirement is that I need to setup a hadoop cluster using s3 as a backup
> (performance won't be an issue)
>
> My Architecture is like  :
> Hive has external table mapped to HBase. HBase is storing data to HDFS.
> Hive is using Hadoop to access HBase table data.
> Can I make this work using S3?
>
> HBase regionserver is failing with Error "Caused by:
> java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException"
>
> HBase master log has lots of  "Unexpected response code 404, expected 200"
>
> Do I need to start DataNode with s3?
> Datanode log says :
>
> 2012-08-02 17:50:20,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = datarpm-desktop/192.168.2.4
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.1
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
> ************************************************************/
> 2012-08-02 17:50:20,145 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2012-08-02 17:50:20,156 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
> at 10 second(s).
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
> started
> 2012-08-02 17:50:20,277 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
> registered.
> 2012-08-02 17:50:20,281 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-08-02 17:50:20,317 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
> the native-hadoop library
> 2012-08-02 17:50:22,006 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
> to <bucket-name>/67.215.65.132:8020 failed on local exception:
> java.io.EOFException
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy5.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
> Caused by: java.io.EOFException
>     at java.io.DataInputStream.readInt(DataInputStream.java:375)
>     at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
>     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)
>
> 2012-08-02 17:50:22,007 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at datarpm-desktop/192.168.2.4
>
>
> Thanks,
>
>
> On Thu, Aug 2, 2012 at 5:22 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> With S3 you do not need a NameNode. NameNode is part of HDFS.
>>
>> On Thu, Aug 2, 2012 at 12:44 PM, Alok Kumar <alok...@gmail.com> wrote:
>> > Hi,
>> >
>> > Followed instructions from this link for setup
>> > http://wiki.apache.org/hadoop/AmazonS3.
>> >
>> > my "core-site.xml " contains only these 3 properties :
>> > <property>
>> >   <name>fs.default.name</name>
>> >   <value>s3://BUCKET</value>
>> > </property>
>> >
>> > <property>
>> >   <name>fs.s3.awsAccessKeyId</name>
>> >   <value>ID</value>
>> > </property>
>> >
>> > <property>
>> >   <name>fs.s3.awsSecretAccessKey</name>
>> >   <value>SECRET</value>
>> > </property>
>> >
>> > hdfs-site.xml is empty!
>> >
>> > Namenode log says, its trying to connect to local HDFS not S3.
>> > Am i missing anything?
>> >
>> > Regards,
>> > Alok
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Alok
>
>



-- 
Harsh J

Re: Hadoop with S3 instead of local storage

Reply via email to