Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1057#discussion_r153926390
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
    @@ -353,6 +353,23 @@ public int getRecordCount() {
     
       public boolean hasRecordCount() { return recordCount != -1; }
     
    +  /**
    +   * This works with non-hyper {@link VectorContainer}s which have no 
selection vectors.
    +   * Appends a row taken from a source {@link VectorContainer} to this 
{@link VectorContainer}.
    +   * @param srcContainer The {@link VectorContainer} to copy a row from.
    +   * @param srcIndex The index of the row to copy from the source {@link 
VectorContainer}.
    +   */
    +  public void appendRow(VectorContainer srcContainer, int srcIndex) {
    +    for (int vectorIndex = 0; vectorIndex < wrappers.size(); 
vectorIndex++) {
    +      ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
    +      ValueVector srcVector = 
srcContainer.wrappers.get(vectorIndex).getValueVector();
    +
    +      destVector.copyEntry(recordCount, srcVector, srcIndex);
    +    }
    +
    +    recordCount++;
    --- End diff --
    
    This is OK for a row-by-row copy. But, you'll get better performance if you 
optimize for the entire batch. Because you have no SV4, the source and dest 
batches are the same so the vectors can be preloaded into an array of vectors 
to avoid the vector wrapper lookup per column.
    
    Plus, if the code is written per batch, you can go a step further: 
vectorize the operation. Copy all values for column 1, then all for column 2, 
and so on. (In this case, you only get each vector once, so sticking with the 
wrappers is fine.) By vectorizing, you may get the vectorized cache-locality 
benefit that Drill promises from its operations. Worth a try to see if you get 
any speed-up.


---

Reply via email to