Nothing in HBase is designed to handle an eventual consistency data
store underneath.
In general, if a file that HBase thinks exists is not accessible on the
file system, HBase will become unstable and you would probably lose
access to that region until the system was restarted or the region was
moved.
Andrew Hitchcock wrote:
Are you trying to run HBASE on an S3 filesystem? An HBasista tried it in
the past and, FYI, found it insufferably slow. Let us know how it goes for
you.
Hi HBasers,
I'm a little late to this conversation, but I thought I should add my 2ยข.
I would recommend NOT writing directly to Hadoop's S3 file systems
from HBase. Not for speed reasons (I don't know how it would perform),
but because S3 is eventually consistent. Hadoop tends to assume that
its underlying distributed file system is consistent. HDFS is
consistent, so it works for most users, but this assumption breaks
down when you are using one of the S3 file systems (s3:// or s3n://).
There are places in Hadoop which write a file and then immediately go
to read it again. Normally S3 reaches consistency quickly enough for
this to not be a problem, but there are times it can take a little bit
longer. In most of these cases, Hadoop assumes that if the file isn't
there now it'll never be there (since HDFS is consistent), so it
either ignores the missing file or throws an error.
Unless HBase was specifically architected to allow eventually
consistent datastores, then I imagine problems will crop up in
production.
I'll admit I'm not familiar with HBase's internals, but I can imagine
a situation like this: HBase decides a log file has gotten too large
and wants to split it. It finishes writing and then closes the file.
(With S3N, the file is actually uploaded to S3 during the close, so
this takes longer than it would with HDFS). As soon as it is finished
calling close(), HBase opens the file for reading but the file might
not have appeared yet. What does HBase do then? I don't know.
Before I trusted HBase on S3 with important data, I'd first want to
verify that it handles eventual consistency properly. Also, S3N
doesn't support append, which I believe HBase uses in the newer
versions (or will soon).
Again, I'm not intimately familiar with the HBase internals, I'm just
presenting my worries. Stack and others, please correct me if I'm
wrong and HBase already takes this into account.
My suggestion would be to run HDFS on your cluster, tell HBase to
write to HDFS, and then make periodic snapshots of your data to S3.
Regards,
Andrew
On Wed, Oct 7, 2009 at 9:47 AM, stack <[email protected]> wrote:
HBase or HDFS is in safe mode. My guess is that its the latter. Can you
figure from HDFS logs why it won't leave safe mode? Usually
under-replication or a loss of a large swath of the cluster will flip on the
safe-mode switch.
Are you trying to run HBASE on an S3 filesystem? An HBasista tried it in
the past and, FYI, found it insufferably slow. Let us know how it goes for
you.
Thanks,
St.Ack
On Wed, Oct 7, 2009 at 9:33 AM, Ananth T. Sarathy <
[email protected]> wrote:
my regionserver has been stuck in safemode. What can i do to get it out
safemode?
Ananth T Sarathy