Hi folks! I'm trying to reason through our "set a storage policy for WALs" feature and having some difficulty. I want to get some feedback before I fix our docs or submit a patch to change behavior.
Here's the history of the feature as I understand it: 1) Starting in HBase 1.1 you can change the setting "hbase.wal.storage.policy" and if the underlying Hadoop installation supports storage policies[1] then we'll call the needed APIs to set policies as we create WALs. The main use case is to tell HDFS that you want the HBase WAL on SSDs in a mixed hardware deployment. 2) In HBase 1.1 - 1.4, the above setting defaulted to the value "NONE". Our utility code for setting storage policies expressly checks any config value against the default and when it matches opts to log a message rather than call the actual Hadoop API[2]. This is important since "NONE" isn't actually a valid storage policy, so if we pass it to the Hadoop API we'll get a bunch of log noise. 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of HBASE-18118. Now if we were to pass the value to the Hadoop API we won't get log noise. The utility code does the same check against our default. The Hadoop default storage policy is "HOT" so presumably we save an RPC call by not setting it again. ---- If the above is correct, how do I specify that I want WALs to have a storage policy of HOT in the event that HDFS already has some other policy in place for a parent directory? e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop admin tools) for "/hbase" to be COLD and I can change "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs will still have the COLD policy. Related, but different problem: I can use Hadoop admin tools to set the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase configs on defaults then I end up with WALs having "ALL_SSD" as their policy in all versions. But in HBase 2 and 1.5+ the HBase configs claim the policy is HOT. Should we always set the policy if the api is available? To avoid having to double-configure in something like the second case, do we still need a way to say "please do not expressly set a storage policy"? (as an alternative we could just call out "be sure to update your WAL config" in docs) [1]: "Storage Policy" gets called several things in Hadoop, like Archival Storage, Heterogenous Storage, HSM, and "Hierarchical Storage". In all cases I'm talking about the feature documented here: http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html I think it's available in Hadoop 2.6.0+, 3.0.0+. [2]: In rel/1.2.0 you can see the default check by tracing starting at FSHLog: https://s.apache.org/BqAk The constants referred to in that code are in HConstants: https://s.apache.org/OJyR And in FSUtils we exit the function early when the default matches what we pull out of configs: https://s.apache.org/A4GA In rel/2.0.0 the code works essentially the same but has moved around. The starting point is now AbstractFSWAL: https://s.apache.org/pp6T The constants now use HOT instead of NONE as a default: https://s.apache.org/7K2J and in CommonFSUtils we do the same early return: https://s.apache.org/fYKr