ctubbsii opened a new issue, #4537:
URL: https://github.com/apache/accumulo/issues/4537

   The configuration hierarchy currently allows table configuration (`table.*` 
properties) to be set in the `conf/accumulo.properties` file (or on the 
command-line) and also in ZooKeeper at the system level. If set at either of 
these locations, they will apply to all tables in Accumulo, in all namespaces.
   
   We have special exceptions for the metadata tables (or maybe all system 
tables?), to prevent certain table configurations from affecting the metadata 
table (I don't remember the details right now... maybe related to constraints?) 
because we know that they can be a problem. However, other table configuration 
can also be a problem, including setting iterators/filters, classloader 
contexts, etc. This can result in unexpected behavior, when a user clones a 
table and copies its configuration as well (raising the question: is it copying 
the effective configuration from the whole configuration hierarchy, or just 
that set on the table?).
   
   To avoid many of these issues, we should disallow setting table 
configuration in a way that affects all tables. Namespaces are the appropriate 
place to configure table configuration to affect many tables at once, not the 
system level.
   
   Table configuration should be ignored/disallowed from the SiteConfiguration 
(`accumulo.properties` and command-line) and the SystemConfiguration (in ZK). 
The shell should error when trying to set these, and any existing configuration 
should result in an ERROR or WARN message about it not having any effect.
   
   We can add a warning about this in 2.1, and change the behavior in 3.1.
   
   If there is still a use case for setting table properties in the 
`accumulo.properties` file, then we can consider adding that as a new feature 
to explicitly configure namespaces and/or tables, something along the lines of 
`table[mynamespace.mytable].someProp=someValue` to affect a specific table, and 
`table[mynamespace].someProp=someValue` to affect all tables in a namespace 
(`table[].someProp=someValue` to affect the default namespace). This is just 
one possibility. `commons-configuration2`, which we use for parsing the 
configuration, may have some useful features and a natural syntax for this, 
instead of the syntax I suggested here. It would be better to omit this feature 
entirely, if we don't actually need it. However, I think the main places where 
we might need it is for system tables on initial startup, to set certain things 
for the system tables, like per-table volume chooser, context, balancer, block 
cache configuration, HDFS replication factor, etc. that are useful to have
  set at system initialization, before any user tables are created. But, such 
things should be set on a specific namespace or table, not able to be set 
globally. So, maybe there's a way we can do that in the configuration instead 
of what we do now.
   
   Another possibility is change the behavior so that such things *only* affect 
the accumulo system namespace (or *only* affect the default namespace), but I 
don't think that option is a good idea, because it could lead to a lot of 
confusion about what is affected, because it's a very different change in 
behavior for existing configuration. We should be very explicit in the 
ERROR/WARN messages about what is allowed, and the configuration should be very 
explicit about what we want it to affect, if we support it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to