ctubbsii commented on a change in pull request #2221:
URL: https://github.com/apache/accumulo/pull/2221#discussion_r682810271



##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);

Review comment:
       Is `Class.forName` the right way to load this class for the VFS 
Classloader in Accumulo in 1.x? What about 2.1 with the new classloader changes 
there?

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -219,7 +220,21 @@ public OutputStream createCompressionStream(OutputStream 
downStream, Compressor
 
     GZ(COMPRESSION_GZ) {
 
-      private transient DefaultCodec codec = null;
+      private static final String USE_QAT_PROPERTY = "use.qat";

Review comment:
       `use.qat` is succinct, but it might be better to have a more explicit 
name in the `accumulo.` namespace because this is for affecting Accumulo 
behavior, something like:
   
   ```suggestion
         private static final String USE_QAT_PROPERTY = 
"accumulo.rfile.compression.quickassist.enabled";
   ```

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -231,14 +246,26 @@ public OutputStream createCompressionStream(OutputStream 
downStream, Compressor
        */
       private static final int DEFAULT_BUFFER_SIZE = 32 * 1024;
 
+      private final Boolean USE_QAT =
+          Boolean.valueOf(System.getProperty(USE_QAT_PROPERTY, "false"));

Review comment:
       Since this will return something non-null, you can assign it to a 
primitive `boolean`. Also, to support `-Duse.qat` in addition to 
`-Duse.qat=true`, this logic will need to be adjusted a bit. Could see how 
Maven does it, because it supports similar flags like `-DskipTests` where you 
don't have to specify `-DskipTests=true`.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;

Review comment:
       If this is null, should we fall back to the default impl? This seems 
like it would result in NPEs and break things in weird ways. The QuickAssist 
impl should probably be an optional optimization that works when available, but 
is ignored when not available.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;

Review comment:
       I'm a little confused by this logic. This reads:
   
   If `io.compression.codec.qat.class` is not set in the Hadoop config file, 
then use the class name defined by 
`-Dorg.apache.hadoop.io.compress.QatCodec.class=????`. This part makes sense, 
although I don't know why we'd want to be able to override the QAT codec class 
name, since it seems like it would be a well-known value.
   
   If it **is** set in the Hadoop config file, then just set the class name to 
`null`, which will result in us using the default class name we've hard-coded. 
This part doesn't make sense to me. Why wouldn't we use the class name from the 
Hadoop config file in this case?
   
   Also, shouldn't we prioritize our explicit `-D` override instead of only 
using it when the Hadoop config doesn't set it? The implementation here will 
prioritize what's in the the default value if the Hadoop config is set, and 
ignore the override.
   
   I think the logic should be:
   
   1. Check if system property is set to non-null. Use it if set, otherwise
   2. check if Hadoop config is set to non-null. Use it if set, otherwise
   3. use the default class name that we've hard-coded.
   
   You could also use the first one in that order where the class isn't just 
non-null, but also the class loads.
   
   Also, minor nit: avoiding unnecessary negations in ternary statements helps 
readability:
   ```java
   String clazz = extClazz == null ? DEFAULT_QAT_CLASS : extClazz;
   ```

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {

Review comment:
       Is CNFE is the only ReflectiveOperationException that can be thrown 
here? I'm surprised if that's the case, but perhaps this is because the rest 
are caught in Hadoop's ReflectionUtils? (Is that really public API for Hadoop? 
That seems like an internal utility class that we should probably avoid using 
ourselves... but it's outside the scope of this PR.)

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;
+        } else {
+          DefaultCodec myCodec = new DefaultCodec();
+          Configuration myConf = new Configuration(conf);
+          // only use the buffersize if > 0, otherwise we'll use
+          // the default defined within the codec
+          if (bufferSize > 0)
+            myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
+          myCodec.setConf(myConf);
+          return myCodec;
+        }
       }
 
       @Override
       public InputStream createDecompressionStream(InputStream downStream,
           Decompressor decompressor, int downStreamBufferSize) throws 
IOException {
-        // Set the internal buffer size to read from down stream.
-        CompressionCodec decomCodec = codec;
-        // if we're not using the default, let's pull from the loading cache
-        if (DEFAULT_BUFFER_SIZE != downStreamBufferSize) {
-          Entry<Algorithm,Integer> sizeOpt = Maps.immutableEntry(GZ, 
downStreamBufferSize);
-          try {
-            decomCodec = codecCache.get(sizeOpt);
-          } catch (ExecutionException e) {
-            throw new IOException(e);
+
+        if (USE_QAT) {

Review comment:
       This conditional is checking that 'use.qat' system property is set to 
true, but not whether the QAT codec successfully loaded. The conditional should 
probably depend on the success of loading the QAT library, especially if we 
fall back to the regular codec if it fails to load, as I suggested elsewhere.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -297,7 +362,7 @@ public OutputStream createCompressionStream(OutputStream 
downStream, Compressor
 
       @Override
       public boolean isSupported() {
-        return true;
+        return codec != null;

Review comment:
       We should always support GZ, even if the QAT library fails to load.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;

Review comment:
       Seems reasonable. Perhaps we can throw an exception instead of `return 
null`, so this error is explicit? Would want to make sure it is thrown early.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);

Review comment:
       That's fine. I just wasn't sure what the "right way" to load a class 
anymore is. :smiley_cat: I haven't thought about classloaders in a bit. 2.1 
code looks similar.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;

Review comment:
       On the other hand, we should weigh the user's intent to use GZ, whether 
or not the optimization is available. The user has two intents, to use GZ, and 
to use QAT. The current behavior says if QAT isn't available, then GZ isn't 
available at all. I'm not sure that's a good experience for users either.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;

Review comment:
       Oh wow. That existing code seems very broken. I'm fine with following 
suit with what the rest of the code does and fixing it later. It's especially 
concerning that we completely ignore the value in the Hadoop config when it's 
set there, and use our property when it's not set there. I'm not sure why we 
even bothered to read the value from the Hadoop config at all in that case, if 
it has no bearing on what we're going to do.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;

Review comment:
       Yeah, I understand the desire to fail fast instead of diagnose when 
things aren't working as expected. I'm just thinking we have to infer what the 
user is expecting. I'm sure some users might expect "I want QAT's impl of GZ 
only". But other users might expect "I want GZ, QAT to make it faster, if 
available". Personally, I'd be somebody in the latter camp.
   
   Could support both sets of user expectations with values of 
`.enabled=[true,false,required/force]`.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
##########
@@ -250,33 +277,71 @@ public void initializeDefaultCodec() {
        */
       @Override
       protected CompressionCodec createNewCodec(final int bufferSize) {
-        DefaultCodec myCodec = new DefaultCodec();
-        Configuration myConf = new Configuration(conf);
-        // only use the buffersize if > 0, otherwise we'll use
-        // the default defined within the codec
-        if (bufferSize > 0)
-          myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
-        myCodec.setConf(myConf);
-        return myCodec;
+        if (USE_QAT) {
+          String extClazz =
+              (conf.get(CONF_QAT_CLASS) == null ? 
System.getProperty(CONF_QAT_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : DEFAULT_QAT_CLASS;
+          try {
+            LOG.info("Trying to load qat codec class: " + clazz);
+
+            Configuration myConf = new Configuration(conf);
+            // only use the buffersize if > 0, otherwise we'll use
+            // the default defined within the codec
+            if (bufferSize > 0)
+              myConf.setInt(QAT_BUFFER_SIZE_OPT, bufferSize);
+
+            CompressionCodec c =
+                (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
+            if (c instanceof Configurable) {
+              ((Configurable) c).setConf(myConf);
+            }
+            return c;
+          } catch (ClassNotFoundException e) {
+            LOG.error("Unable to create QAT codec", e);
+          }
+          return null;

Review comment:
       Yeah, I understand the desire to fail fast instead of diagnose when 
things aren't working as expected. I'm just thinking we have to infer what the 
user is expecting. I'm sure some users might expect "I want QAT's impl of GZ 
only". But other users might expect "I want GZ, QAT to make it faster, if 
available". Personally, I'd be somebody in the latter camp.
   
   Could support both sets of user expectations with values of 
`.enabled=[true,false,required/force]`. However, considering the 
forwards-compatible requirement of 1.10.2 with 1.10.1/1.10.0, it's impossible 
for a user of "1.10.x" versions to expect that it will always fail if QAT isn't 
available. The user might set '.enabled=force', but then need to downgrade to 
1.10.1 for whatever reason, and the option is completely ignored. So, at best, 
the user can only reasonably expect 1.10 versions to treat this as an 
optimization, and never a requirement, unless they know for sure they are a 
sufficiently recent 1.10 version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to