Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/106#discussion_r66064286
  
    --- Diff: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java 
---
    @@ -78,41 +86,90 @@ public void flush() throws IOException {
       public static final String COMPRESSION_NONE = "none";
     
       /**
    -   * Compression algorithms.
    +   * Compression algorithms. There is a static initializer, below the 
values defined in the enumeration, that calls the initializer of all defined 
codecs within
    +   * the Algorithm enum. This promotes a model of the following call graph 
of initialization by the static initializer, followed by calls to getCodec() and
    +   * createCompressionStream/DecompressionStream. In some cases, the 
compression and decompression call methods will include a different buffer size 
for the
    +   * stream. Note that if the compressed buffer size requested in these 
calls is zero, we will not set the buffer size for that algorithm. Instead, we 
will use
    +   * the default within the codec.
    +   *
    +   * The buffer size is configured in the Codec by way of a Hadoop 
Configuration reference. One approach may be to use the same Configuration 
object, but when
    +   * calls are made to createCompressionStream and DecompressionStream, 
with non default buffer sizes, the configuration object must be changed. In 
this case,
    +   * concurrent calls to createCompressionStream and DecompressionStream 
would mutate the configuration object beneath each other, requiring 
synchronization to
    +   * avoid undesirable activity via co-modification. To avoid 
synchronization entirely, we will create Codecs with their own Configuration 
object and cache them
    +   * for re-use. A default codec will be statically created, as mentioned 
above to ensure we always have a codec available at loader initialization.
    +   *
    +   * There is a Guava cache defined within Algorithm that allows us to 
cache Codecs for re-use. Since they will have their own configuration object 
and thus do
    +   * not need to be mutable, there is no concern for using them 
concurrently; however, the Guava cache exists to ensure a maximal size of the 
cache and
    +   * efficient and concurrent read/write access to the cache itself.
    +   *
    +   * To provide Algorithm specific details and to describe what is in code:
    +   *
    +   * LZO will always have the default LZO codec because the buffer size is 
never overridden within it.
    +   *
    +   * GZ will use the default GZ codec for the compression stream, but can 
potentially use a different codec instance for the decompression stream if the
    +   * requested buffer size does not match the default GZ buffer size of 
32k.
    +   *
    +   * Snappy will use the default Snappy codec with the default buffer size 
of 64k for the compression stream, but will use a cached codec if the buffer 
size
    +   * differs from the default.
        */
       public static enum Algorithm {
    +
         LZO(COMPRESSION_LZO) {
    -      private transient boolean checked = false;
    +      /**
    +       * determines if we've checked the codec status. ensures we don't 
recreate the defualt codec
    +       */
    +      private transient AtomicBoolean checked = new AtomicBoolean(false);
           private static final String defaultClazz = 
"org.apache.hadoop.io.compress.LzoCodec";
           private transient CompressionCodec codec = null;
     
    +      /**
    +       * Configuration option for LZO buffer size
    +       */
    +      private static final String BUFFER_SIZE_OPT = 
"io.compression.codec.lzo.buffersize";
    +
    +      /**
    +       * Default buffer size
    +       */
    +      private static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
    +
           @Override
    -      public synchronized boolean isSupported() {
    -        if (!checked) {
    -          checked = true;
    -          String extClazz = (conf.get(CONF_LZO_CLASS) == null ? 
System.getProperty(CONF_LZO_CLASS) : null);
    -          String clazz = (extClazz != null) ? extClazz : defaultClazz;
    -          try {
    -            LOG.info("Trying to load Lzo codec class: " + clazz);
    -            codec = (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), conf);
    -          } catch (ClassNotFoundException e) {
    -            // that is okay
    -          }
    -        }
    +      public boolean isSupported() {
             return codec != null;
           }
     
    +      public void initializeDefaultCodec() {
    +        if (!checked.get()) {
    +          checked.set(true);
    +          codec = createNewCodec(DEFAULT_BUFFER_SIZE);
    +        }
    +      }
    +
           @Override
    -      CompressionCodec getCodec() throws IOException {
    -        if (!isSupported()) {
    -          throw new IOException("LZO codec class not specified. Did you 
forget to set property " + CONF_LZO_CLASS + "?");
    +      CompressionCodec createNewCodec(int bufferSize) {
    +        String extClazz = (conf.get(CONF_LZO_CLASS) == null ? 
System.getProperty(CONF_LZO_CLASS) : null);
    +        String clazz = (extClazz != null) ? extClazz : defaultClazz;
    +        try {
    +          LOG.info("Trying to load Lzo codec class: " + clazz);
    +          Configuration myConf = new Configuration(conf);
    +          // only use the buffersize if > 0, otherwise we'll use
    +          // the default defined within the codec
    +          if (bufferSize > 0)
    +            myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
    +          codec = (CompressionCodec) 
ReflectionUtils.newInstance(Class.forName(clazz), myConf);
    +          return codec;
    +        } catch (ClassNotFoundException e) {
    +          // that is okay
             }
    +        return null;
    +      }
     
    +      @Override
    +      CompressionCodec getCodec() throws IOException {
    --- End diff --
    
    Can this {{throws IOException}} be removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to