Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r153926390 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -353,6 +353,23 @@ public int getRecordCount() { public boolean hasRecordCount() { return recordCount != -1; } + /** + * This works with non-hyper {@link VectorContainer}s which have no selection vectors. + * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}. + * @param srcContainer The {@link VectorContainer} to copy a row from. + * @param srcIndex The index of the row to copy from the source {@link VectorContainer}. + */ + public void appendRow(VectorContainer srcContainer, int srcIndex) { + for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) { + ValueVector destVector = wrappers.get(vectorIndex).getValueVector(); + ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector(); + + destVector.copyEntry(recordCount, srcVector, srcIndex); + } + + recordCount++; --- End diff -- This is OK for a row-by-row copy. But, you'll get better performance if you optimize for the entire batch. Because you have no SV4, the source and dest batches are the same so the vectors can be preloaded into an array of vectors to avoid the vector wrapper lookup per column. Plus, if the code is written per batch, you can go a step further: vectorize the operation. Copy all values for column 1, then all for column 2, and so on. (In this case, you only get each vector once, so sticking with the wrappers is fine.) By vectorizing, you may get the vectorized cache-locality benefit that Drill promises from its operations. Worth a try to see if you get any speed-up.
---