[jira] [Commented] (ACCUMULO-3842) [UMBRELLA] Remove non-transient data from ZooKeeper

Josh Elser (JIRA) Tue, 26 May 2015 14:58:44 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559952#comment-14559952
 ]


Josh Elser commented on ACCUMULO-3842:
--------------------------------------

{quote}
Do you have a strong reason you can express for changing the default behavior 
of storing this in ZK? Or is it just the lack of good comprehensive 
backup/restore tools which you've already mentioned (which seems to me to be an 
easier problem to solve)?
{quote}

One side of it is that I think the more we can store in tables, the easier it 
is for users to reason about. Having as much data as we can stored in Accumulo 
itself would be easier for users to say "how is X stored?".

Another aspect is the scalability side. We know that making the problem fit in 
ZK's constraints is often an exercise to interacting with it at scale. We can 
likely do better scalability-wise in Accumulo itself. We probably aren't at 
that point where we _need_ to do this yet, but it's something to keep in mind.

{quote}
>From my perspective, ZK seems to be a relatively solid component. Because of 
>that, it seems to me that burden is on any alternative to demonstrate a 
>greater degree of reliability, scalability, or other benefit.
{quote}

I'm pretty sure we already have solved the reliability and scalability problems 
in Accumulo itself. A bit of eating our own dogfood by leveraging a table. 
We're better suited at storing a larger category of data than ZooKeeper is 
(e.g. we can store larger blobs of data).

Just to be clear: I think this has kind of broken down into two separate 
discussions already. One is the configuration consistency issue which I think 
we're all in agreement on needs to happen. The latter is my general opinion 
that we should start moving away from ZK as our "general 
not-table-related-data" store which deserves the continued discussion about 
what the actual observed benefits would be, short term and long term (and how, 
if at all, the two are related).

> [UMBRELLA] Remove non-transient data from ZooKeeper
> ---------------------------------------------------
>
>                 Key: ACCUMULO-3842
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3842
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, tserver
>            Reporter: Josh Elser
>             Fix For: 1.8.0
>
>
> Wanted to start brainstorming about this.
> We store a lot of persistent data in ZooKeeper that would better stored in 
> something backed by HDFS. ZooKeeper can be a very convenient place to store 
> persisted data so that it's available to all nodes, but it comes at a price 
> and often must be asynchronously accessed to achieve good performance.
> * Table/Namespace configuration
> * Users/Authorizations
> * Problem reports (maybe?)
> * System configuration overrides (maybe?)
> Some benefits we'd see from this:
> * Loss of ZooKeeper doesn't lose table configuration and users.
> * Greatly reduce zookeeper watchers (assume 
> watchers=50*num_tables*num_tservers)
> * Consistent updates of table constraints and all other table properties
> The last note is the most important one IMO. The number of test issues alone 
> that we've had with constraints not being seen on all servers are bound to 
> affect users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-3842) [UMBRELLA] Remove non-transient data from ZooKeeper

Reply via email to