Mafish Liu wrote:
Hi, Stack:
I'm recently back from my three-weeks vocation
Don't vocations usually last a lifetime (smile)?
and almost missed all the
news about hadoop/hbase for the last 20 days. Now, we can continue our
discussion here.
I'm desinging a system to manage vector data (a type of gis data) with
hbase.
All vector data will be divided to several regions according quad-tree
algorithm.
The main purpose of this design is to distribute computation instead of
storage,
since vector data management is a computing-intensive task. So, I need to
get a
clear view on the map of data to node.
Tables are divided into Regions. Simplistically, Regions are all of the
same size physically. If entries are of like-size, then each Region
will have roughly same number of keys. Regions are deployed over the
hbase cluster using a primitive -- some would say 'broken' -- algorithm
that tries to make take into account current loading of the cluster
member ("RegionServer"). Any RegionServer may have 0 or more Regions.
What do you need? Do you want to be able to influence the allocation
algorithm?
Thus, I can send computing request to
certain
nodes to make computation locally.
How are you thinking of running the compute task next to the data?
There is no provision yet in hbase for running 'stored procedures' out
on the cluster nodes.
I'll start to write the HOWTO these two days, titled "Getting start with
hbase-based program."
No problem. When you get a chance. We have a bit of work to do in this
area. Some is gated on our finishing our setup in hudson so we can
point to nightlies that folks can download: e.g. the 'hbase in ten
minutes' document.
St.Ack
On Thu, Feb 21, 2008 at 2:51 PM, stack <[EMAIL PROTECTED]> wrote:
Hey Mafish:
There is no such provision in hbase at the moment. Tables are
distributed across the hbase cluster as it sees fit (We'd be interested
in hearing more about why you need this facility).
Regards helping us out w/ a HOWTO, that'd be great. Here's what we have
at the moment: http://hadoop.apache.org/hbase/docs/current/. Also, hbase
recently got its own mailing lists. You'll get a more prompt response if
you move your questions/comments there. See
http://hadoop.apache.org/hbase/mailing_lists.html.
Thanks,
St.Ack