IMO, they are very similar. Lots of smart people on both sides making
really good changes.
I think HBase has a lot more instrumentation and understanding on the
resources a cluster will use. For example, I think it's much clearer the
resources and threads that the RPC server will use. This is much more
obtuse (and grows/shrinks on its own in Accumulo). I think this also
drifts into the HBase API usage too -- I have a much better
understanding of what needs to be managed with HBase. This is a little
more obtuse in Accumulo for new users.
I also think HBase has a much better understanding and tuning of the
read path. I would trust consistently performing (SLA bound) workloads
on HBase much more than Accumulo just because there hasn't been (public)
work that is or has happened in Accumulo.
On the other side, it's been years since I've seen data loss or
assignment bugs in Accumulo. Around 1.1.0, the bugs that Enis and Stack
fixed shocked me. I was rather surprised to see these kinds of bugs crop
up, and really worried me when I spent quite a lot of time trying to
understand the bugs. Personally, I would trust Accumulo to be thrown off
a cliff and still keep chugging (again, because I've done this myself).
I don't have this confidence with HBase (yet).
Specifically WRT security since you brought it up. Last I tried to play
with the cell-level security APIs in HBase, it seemed very obtuse to me.
Perhaps I was just dense and didn't find the right sort of instructions.
I think where security is critical, I would trust Accumulo more because
it's been very fleshed out over many years and been a part of the core
model since the start. I felt that HBase is still in a shake-down phase.
(again, I don't want to be argumentative -- it's just my personal
experience to date using the code and watching JIRA issues)
The HBase coprocessors and Accumulo iterators difference will still
stand (they are not equivalent features and solve different problems,
IMO). Coprocessors enable quite a bunch of interesting things (notably,
Phoenix). At the same time, I like the functional-conciseness in how I
can represent some problems using Accumulo iterators.
Ultimately, consider the use cases, evaluate the solutions and make your
decision off of empirical evidence. That's the only way to really make a
decision :)
Jerry He wrote:
Hi, folks
We have people that are evaluating HBase vs Accumulo.
Security is an important factor.
But I think after the Cell security was added in HBase, there is no more
real gap compared to Accumulo.
I know we have both HBase and Accumulo experts on this list.
Could someone shred more light?
I am looking for real gap comparing HBase to Accumulo if there is any so
that I can be prepared to address them. This is not limited to the security
area.
There are differences in some features and implementations. But they don't
see like real 'gaps'.
Any comments and feedbacks are welcome.
Thanks,
Jerry