Seref Arikan wrote:
I am very keen to work on hbase and hadoop, but especially for the case of
hbase, the material I've found so far does not look promising for using it
as backend of an information system.
...
The question is: are there any practices, recommended methods etc for
implementing a hbase based backend for an information system like a web
based app or a medium sized enterprise application with a couple of hundred
users?
...
The original bigfile paper by Google gives names of some google application
which are very responsive and obviously require real time query/insert etc.
Any ideas, suggestions would be appreciated a lot. Hybrid approaches with
rdbms etc?
Yes. Our random-access performance up to this has been lacking. Fellas
are using hbase to serve out of in real-time but they'll usually have a
cache in front of hbase, in the same way as folks will run a memcached
in front of their SQL db as noted in the streamy.com paragraph up on the
powered-by page (Also see the pigi project listed in the supporting
projects page).
Up to this we've been focused on reliability and scaling. After our
0.19.0 release goes out, we're going to turn our focus to efficiency and
performance (though 0.19.0 promises big performance increases -- we'll
post more on this around hbase 0.19.0 release time).
If you want to join us for the ride, we'd love to have you. What kind
of numbers do you need in order for you to commit to hbase?
If you want to set up a test cluster to do your own measurements, let us
know how we can help.
St.Ack