Re: HBase Design Ideas, Part I

Stefan Groschupf Tue, 30 May 2006 14:44:59 -0700

Hi,
sorry for the delay in responding...

I already posted a mail about this issue.
What we may be need is a Writer that can seek first for row key and
than for column keys.
In general I agree with  sparse structure.
What about this: don't store explicit "column" fields anywhere.Rather,each row is stored as a series of key-value pairs, where the key isthe
column name.

I didn't got this.

How you want to associate than one key - value pair ( let's name itcell) to a row key?As mentioned I see a object "rowKey - columnName - value" or onerowKey - columnKey-Value[]

True, if there are a huge number of columns and you are interested in
just one, there will be unnecessary processing. This is especiallybad
if one column is a 2-char string and another column is a video file.
So we should actually keep a family of files, segmented by objectsize.But in the general case, it shouldn't be possible to "seek to acolumn".Instead, you seek to a row and unpack all its key/val (col/cell)pairs.

Hmm, I'm not sure if I like the idea of having size based separatedfiles of columns.I don't think there are many use cases where people will store letssay locales and video files associated to the same url row key.

In such a case it makes more sense to have separated tables.

From my point of view the best way would be to have a kind of columnseek mechanism, what will require a other kind of sequence writer andreader.As far I remember the google system has all columns of a row in onetablet.What you think about to beeing able have one row in different tabletsbut each tablet has different rows?

So not just distribute the rows but also columns.

My idea was to have the lock on the HRegionServer level, my ideas was
that the client itself take care about replication,
means write the value to n servers that have the same replicatins of
HRegions.



Do you mean that a lock applies to an entire server at once?  Or
that an HRegionServer is responsible for all locks?  (I'd like to do
the latter, at least in the short-term.)

Yes, the later is better from my point of view.


I'd like to avoid having an HRegion that's hosted by multiple servers,
because then it's unclear which HRegionServer should own the lock.
I suppose the HRegionServers for a given HRegion could hold an
election, but this seems like a lot of work.

If there's a row that's really "hot" and wanted by a lot ofclients, I couldimagine starting a series of "read-only" HRegionServers that fieldread

requests.  That way you avoid having an election for the lock but can
still scale capacity if necessary.

That is a good idea.


> The HBase system can repartition an HTable at any time.  For
> example, many
> repeated inserts at a single location may cause a single HRegion to
> grow
> very large.  The HBase would then try to split that into multiple
> HRegions.
> Those HRegions may be served by the same HRegionServer as the
> original or may be served by a different one.
Would the node send out a message to request a split or does the
master decide based on heart beat messages?



There are two ways that an HRegionServer might offer brand-new
service for an HRegion:
1) The HRegion's old HRegionServer died.  A new HRegionServer
offers the exact same HRegion, loaded from a DFS file.  This will

have to be initiated by the HBaseMaster, because it is the onlynode that

knows about heartbeats.

Make sense.


2) An HRegion is getting too big, and must be split into two.  I
imagine that this can be initiated by the local HRegionServer,
which then asks the master for various hints (like where there
is another lightly-loaded HRegionServer that could take a new
Region).

May be the local Region Server just request to be spitted and themaster handle the split itself.My concern is that just using heart beats to announce regions to themaster is not fast enough.Means when region is splitted all rows need to be read only duringthe process. The master need to know the two new regions before weremove the write lock.


My idea was to simply download the data to the node and read any time

locally, but write into the dfs, since in my case write access can be
slower but I needer very fast read access.



You mean just keep a local cache of the DFS file?  That might be

a good idea for a feature we add into DFS as a performanceenhancement.


Yes, reading files from DFS is too slow,

we ran into the same performance problem to often in the severalprojects.

For example reading a lucene index file - as nutch does - from dfs isjust useless. But loading a copy to the local hdd is fast enoughduring startup.In general I don't think disk space is an issue these days, so I haveno problem to have data replicated in the dfs and on a local hdd.

Re: HBase Design Ideas, Part I

Reply via email to