[jira] [Commented] (ACCUMULO-4164) Avoid copy of RFile Index blocks when in cache

ASF GitHub Bot (JIRA) Sat, 19 Mar 2016 12:34:25 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202925#comment-15202925
 ]


ASF GitHub Bot commented on ACCUMULO-4164:
------------------------------------------

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/80#discussion_r56753657
  
    --- Diff: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/SeekableByteArrayInputStream.java
 ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.accumulo.core.file.blockfile.impl;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +
    +/**
    + * This class is like byte array input stream with two differences. It 
supports seeking and avoids synchronization.
    + */
    +public class SeekableByteArrayInputStream extends InputStream {
    +
    +  // make this volatile to ensure data set by one thread can be seen by 
another
    +  private volatile byte buffer[];
    +  private int cur;
    +  private int max;
    +
    +  @Override
    +  public int read() {
    +    if (cur < max) {
    +      return buffer[cur++] & 0xff;
    +    } else {
    +      return -1;
    +    }
    +  }
    +
    +  @Override
    +  public int read(byte b[], int offset, int length) {
    +    if (b == null) {
    +      throw new NullPointerException();
    +    }
    +
    +    if (length < 0 || offset < 0 || length > b.length - offset) {
    +      throw new IndexOutOfBoundsException();
    +    }
    +
    +    if (length == 0) {
    +      return 0;
    +    }
    +
    +    int avail = max - cur;
    +
    +    if (avail <= 0) {
    +      return -1;
    +    }
    +
    +    if (length > avail) {
    +      length = avail;
    +    }
    +
    +    System.arraycopy(buffer, cur, b, offset, length);
    +    cur += length;
    +    return length;
    +  }
    +
    +  @Override
    +  public long skip(long requestedSkip) {
    +    long actualSkip = max - cur;
    +    if (requestedSkip < actualSkip)
    +      if (requestedSkip < 0)
    +        actualSkip = 0;
    +      else
    +        actualSkip = requestedSkip;
    +
    +    cur += actualSkip;
    +    return actualSkip;
    +  }
    +
    +  @Override
    +  public int available() {
    +    return max - cur;
    +  }
    +
    +  @Override
    +  public boolean markSupported() {
    +    return false;
    +  }
    +
    +  @Override
    +  public void mark(int readAheadLimit) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void reset() {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void close() throws IOException {}
    +
    +  public SeekableByteArrayInputStream(byte[] buf) {
    +    this.buffer = buf;
    +    this.cur = 0;
    +    this.max = buf.length;
    +  }
    +
    +  public SeekableByteArrayInputStream(byte[] buf, int maxOffset) {
    +    this.buffer = buf;
    --- End diff --
    
    `Objects.requireNonNull(buf)`



> Avoid copy of RFile Index blocks when in cache
> ----------------------------------------------
>
>                 Key: ACCUMULO-4164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4164
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.6.5, 1.7.1
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.6.6, 1.7.2, 1.8.0
>
>
> I have been doing performance experiments with RFile.  During the course of 
> these experiments I noticed that RFile is not as fast at it should be in the 
> case where index blocks are in cache and the RFile is not already open.  The 
> reason is that the RFile code copies and deserializes the index data even 
> though its already in memory.
> I made the following change to RFile in a branch.
>  * Avoid copy of index data when its in cache
>  * Deserialize offsets lazily (instead of upfront) during binary search
>  * Stopped calling lots of synchronized methods during deserialization of 
> index info.  The existing code use ByteArrayInputStream which results in lots 
> of fine grained synchronization.  Switching to an inputstream that offers the 
> same functionality w/o sync showed a measurable performance difference.  
> These changes lead to performance in the following two situations  :
>  * When an RFiles data is in cache, but its not open on the tserver.  
>  * For RFiles with multilevel indexes with index data in cache.   Currently 
> an open RFile only keeps the root node in memory.   Lower level index nodes 
> are always read from the cache or DFS.   The changes I made would always 
> avoid the copy and deserialization of lower level index nodes when in cache.
> I have seen significant performance improvements testing with the two cases 
> above.  My test are currently based on a new API I am creating for RFile, so 
> I can not easily share them until I get that pushed.  
> For the case where a tserver has all files frequently in use already open and 
> those files have a single level index, these changes should not make a 
> significant performance difference.
> These change should result in less memory use for opening the same rfile 
> multiple times for different scans (when data is in cache).  In this case all 
> of the RFiles would share the same byte array holding the serialized index 
> data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-4164) Avoid copy of RFile Index blocks when in cache

Reply via email to