Hi,
sorry for the delay in responding...

I already posted a mail about this issue.
What we may be need is a Writer that can seek first for row key and
than for column keys.
In general I agree with  sparse structure.


What about this: don't store explicit "column" fields anywhere. Rather, each row is stored as a series of key-value pairs, where the key is the
column name.
I didn't got this.
How you want to associate than one key - value pair ( let's name it cell) to a row key? As mentioned I see a object "rowKey - columnName - value" or one rowKey - columnKey-Value[]

True, if there are a huge number of columns and you are interested in
just one, there will be unnecessary processing. This is especially bad
if one column is a 2-char string and another column is a video file.

So we should actually keep a family of files, segmented by object size. But in the general case, it shouldn't be possible to "seek to a column". Instead, you seek to a row and unpack all its key/val (col/cell) pairs.
Hmm, I'm not sure if I like the idea of having size based separated files of columns. I don't think there are many use cases where people will store lets say locales and video files associated to the same url row key.
In such a case it makes more sense to have separated tables.
From my point of view the best way would be to have a kind of column seek mechanism, what will require a other kind of sequence writer and reader. As far I remember the google system has all columns of a row in one tablet. What you think about to beeing able have one row in different tablets but each tablet has different rows?
So not just distribute the rows but also columns.



My idea was to have the lock on the HRegionServer level, my ideas was
that the client itself take care about replication,
means write the value to n servers that have the same replicatins of
HRegions.


Do you mean that a lock applies to an entire server at once?  Or
that an HRegionServer is responsible for all locks?  (I'd like to do
the latter, at least in the short-term.)
Yes, the later is better from my point of view.

I'd like to avoid having an HRegion that's hosted by multiple servers,
because then it's unclear which HRegionServer should own the lock.
I suppose the HRegionServers for a given HRegion could hold an
election, but this seems like a lot of work.

If there's a row that's really "hot" and wanted by a lot of clients, I could imagine starting a series of "read-only" HRegionServers that field read
requests.  That way you avoid having an election for the lock but can
still scale capacity if necessary.
That is a good idea.

> The HBase system can repartition an HTable at any time.  For
> example, many
> repeated inserts at a single location may cause a single HRegion to
> grow
> very large.  The HBase would then try to split that into multiple
> HRegions.
> Those HRegions may be served by the same HRegionServer as the
> original or may be served by a different one.
Would the node send out a message to request a split or does the
master decide based on heart beat messages?


There are two ways that an HRegionServer might offer brand-new
service for an HRegion:
1) The HRegion's old HRegionServer died.  A new HRegionServer
offers the exact same HRegion, loaded from a DFS file.  This will
have to be initiated by the HBaseMaster, because it is the only node that
knows about heartbeats.
Make sense.

2) An HRegion is getting too big, and must be split into two.  I
imagine that this can be initiated by the local HRegionServer,
which then asks the master for various hints (like where there
is another lightly-loaded HRegionServer that could take a new
Region).
May be the local Region Server just request to be spitted and the master handle the split itself. My concern is that just using heart beats to announce regions to the master is not fast enough. Means when region is splitted all rows need to be read only during the process. The master need to know the two new regions before we remove the write lock.


My idea was to simply download the data to the node and read any time
locally, but write into the dfs, since in my case write access can be
slower but I needer very fast read access.


You mean just keep a local cache of the DFS file?  That might be
a good idea for a feature we add into DFS as a performance enhancement.

Yes, reading files from DFS is too slow,
we ran into the same performance problem to often in the several projects.

For example reading a lucene index file - as nutch does - from dfs is just useless. But loading a copy to the local hdd is fast enough during startup. In general I don't think disk space is an issue these days, so I have no problem to have data replicated in the dfs and on a local hdd.


Reply via email to