[ 
https://issues.apache.org/jira/browse/ACCUMULO-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-1719:
---------------------------------
    Fix Version/s:     (was: 1.7.0)
                   1.8.0

> Convenient instanceName to instanceID mapping is unnecessary
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-1719
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1719
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>            Reporter: Christopher Tubbs
>             Fix For: 1.8.0
>
>
> ZooKeeperInstance constructor typically takes two parameters: instanceName 
> and a comma separated list of zookeeper host[:port] (there's some others 
> also, that take a UUID and/or a timeout setting).
> Initialize generates a UUID and associates a user-provided instanceName to 
> it, with the following mapping in ZooKeeper:
> /accumulo/instances/instanceName, which contains a UUID, which points to 
> /accumulo/UUID
> Since the introduction of instance.secret, there are potential problems with 
> this mapping.
> If /accumulo (and /accumulo/instances and /accumulo/instances/instanceName) 
> is created by Initialize in a write-protected way (using instance.secret), 
> then re-initializing with a new generated instanceID but the same 
> instanceName will not work unless the new instance has the same instance 
> secret. This is very limiting and can be a nightmare for system 
> administrators and developers trying to re-initialize.
> If it is not created in a write-protected way, there's an even bigger 
> problem, because anybody with access to ZooKeeper can overwrite the old 
> mapping to point to a new instance (and we expect all clients to be able to 
> access ZooKeeper). While the old data is still protected, any clients 
> connecting with the instanceName will connect (and ingest to) the new 
> instanceID that the instanceName currently maps to.
> The current implementation appears to be using the former... (the 
> instanceName node itself is protected by the same secret as the instanceId 
> and child nodes). This means that at least the mapping is protected from 
> being overwritten... but it also means that it doesn't provide us with any 
> added value. Even if we're counting the added value of being able to 
> reinitialize the same instanceName (generating a new instanceID), leaving the 
> old instance data around for inspection, we've got the problems of ZK filling 
> up and the fact that the mapping was re-written, we can't tell which old 
> instanceID was the previous one to inspect.
> A better solution:
> Drop the mapping. It is unnecessary complex with no added value. Allow the 
> instanceName that users create in new versions to represent the unique ID. 
> Don't generate/use UUIDs anymore... use the provided instanceName. Keep the 
> API for UUID... but just for convenience (treat it like a string internally). 
> We can still prompt to overwrite the old instance... if it exists AND we have 
> the same secret... but when we "overwrite it", we can optionally rename the 
> old instanceName to instanceName_backup_date.
> Dropping the mapping has the benefit of reduced complexity, and (mostly) 
> backwards-compatible (instances can't have the name "instances"). It is 
> easier on developers to debug their instances, because there's no obscure 
> UUID to deal with (unless they want to use that as the name) and they can 
> find the old versions of their instances if they choose to back up the old 
> data when re-initalizing. If not, they can avoid ZK filling up (esp. in dev 
> environments where instanceNames get reused often). And, with a backup naming 
> convention, it's easy for admins to decide which old instance data to keep 
> and which to throw away... without the need of a mapping. The scope for the 
> instance.secret is also well-defined to just the /accumulo/instanceName that 
> created it, and there's no possibility of overwriting the instanceName to 
> instanceID mapping.
> Instance names work best when unique. Instance IDs are guaranteed to be 
> unique. There's no good reason these should be separate things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to