ctubbsii commented on issue #5414:
URL: https://github.com/apache/accumulo/issues/5414#issuecomment-2813889150

   I think the biggest risk here is that a direct table lookup from the TableId 
alone will be a little more work. One will have to iterate over all namespaces 
to find the TableId (in the worst case). This isn't really a problem for API 
operations, where the namespace name is part of the table name, when resolving 
the TableId internally... it would just mean more tracking of Table information 
beyond the TableId alone, in order to look things up efficiently.
   
   Where this is more of a problem (but still not much of one, since looking up 
each namespace isn't that bad), is when retrieving table information using 
TableIds in the metadata table, or when reading them out of the write-ahead 
logs. There, we don't have the namespace information serialized, so we would 
have to look it up. We could store namespaceIds in the WAL, though, or even in 
the metadata table.
   
   We could also structure TableIds with a NamespaceId prefix, so we always 
have a namespace when we have a TableId. That would require a metadata upgrade 
step or some backwards compatibility code.
   
   Another idea would be to keep the `/tables` list, but only as a reverse 
lookup from tableId to namespaceId. But, it would be an extra thing to keep 
consistent. Since these would only be fixed values (fixed IDs, not names), this 
would only get updated on table creation and deletion, and might not be a bad 
option.
   
   Implementing any of these ideas could be a follow-on to this, but I think 
that iterating over all namespaces in the worst case to compose a full list of 
TableIds to look up is not so bad. In the worst case, it's one lookup per table 
(in the case of each table in its own unique namespace), and this change would 
already save more lookups than that, because we would no longer need to look up 
the `/namespace` node on each table. Since we're caching the table mappings for 
each namespace, these lookups might not even result in any new RPCs... just 
synthesizing data already cached.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to