HBase has no more latency than HDFS itself, since any of the latency
in getting data is from the datanode itself.  You'll want to use a CDN
or other reverse-proxy-caching mechanism no matter what you are doing.

As for SPOF vs not, at this point in time, HBase has backup masters,
obviating the SPOF there. Of course we are dependent on HDFS, which
has the namenode, but the HDFS team is rapidly working on it.

Now, why would one use hbase for live serving, with SPOF and all...
well at this time, the SPOF is very managable.  Raid and dual power
supplies will protect you from a majority of the problems you might
encounter re: node availability.  And the HDFS team is working on
solutions in the long term.  Aside from that, I personally like the
greater flexibility HBase gives you - with something like voldemort
it's just a piece in your systems architecture.  With hbase, it is
flexible enough you can do go farther, thus reducing the number of
distinct components in your systems architecture.

Good luck!
-ryan

On Mon, Oct 19, 2009 at 10:04 AM, Amandeep Khurana <[email protected]> wrote:
> comments inline
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Mon, Oct 19, 2009 at 6:58 AM, Fred Zappert <[email protected]> wrote:
>
>> Does anyone want to pick up on this?
>>
>> ---------- Forwarded message ----------
>> From: Luis Carlos Junges <[email protected]>
>> Date: Mon, Oct 19, 2009 at 4:14 AM
>> Subject: Store Large files/images HBase
>> To: [email protected]
>>
>>
>> Hi,
>>
>> I am currently doing some research on distributed database that can be
>> scaled easily in terms of storage capacity.
>>
>> The reason is to use it on the brazilian federal project called
>> "portal do aluno" wich will have around 10 million kids accessing it
>> monthly. The idea is to build a portal similar to facebook/orkut with
>> the main objective to spread knowledge amoung kids (6 -13 years old).
>>
>> well, now the problem:
>>
>> Those kids will generate a lot of data which include photos, videos,
>> presentations, school tasks among others. In order to have a 100%
>> available system and also to scale this amount of data (initial
>> estimative is 10  TB at the full use of the portal), a distributed
>> storage engine seems to be the solution.
>>
>> On the avialable solutions, i liked voldemort because it seems not to have
>> a
>> SPOF (single point of
>> failure) when compared to HBase. However HBase seems to integrate with more
>> tools and sub-projects.
>>
>
> The Hbase 0.20 release doesnt have an SPOF. We have the capability of having
> multiple masters daemons running on different nodes. The master is elected
> out of one of them through zookeeper.
>
>>
>> my question is concerned to the fact of storing such big items (2 MB
>> photo for example) with HBase. I read on on blogs that HBase has a high
>> latency which leads it to
>> be inappropriate to serve dynamic pages. Will the performance of HBase
>> decrease even more if large binary objects are stored on it?
>>
>
> Again, the 0.20 release has solved the problem of high latency to a great
> degree. The read speeds are comparable to a MySQL database. Ofcourse, larger
> objects would mean more time to read.
>
>
>>
>> Other question i have is related to the fact of modelling the data
>> using key/value pattern. With relational database it is just follow cake
>> recipe and it´s done. Do we have such recipe for key/value? Currently
>> a lot of code was done with relational database postgreSQL using
>> hibernate to mapping the objects.
>>
>>
> The modelling will depend on the kind of queries you want to do. Post a
> little more about the kind of data you have and the queries you want to do
> on it. You can get specific tips accordingly.
>
>
>
>>
>> i will appreciate any comments
>>
>>
>>
>>
>> --
>> "A realidade de cada lugar e de cada época é uma alucinação coletiva."
>>
>>
>> Bloom, Howard
>>
>

Reply via email to