[
https://issues.apache.org/jira/browse/DRILL-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001706#comment-16001706
]
Paul Rogers edited comment on DRILL-5488 at 5/8/17 11:04 PM:
-------------------------------------------------------------
Consider the {{setValueCount()}} method again:
{code}
public void setValueCount(int valueCount) {
final int currentValueCapacity = getValueCapacity();
...
while(valueCount > getValueCapacity()) {
reAlloc();
}
if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
...
{code}
The {{if}} statement is very probably wrong. We get the current capacity. If
too small, we realloc. Then, we check the value, after the possible
reallocation, without updating the capacity. It is not clear if we are trying
to do check the before or after.
Then:
{code}
if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
incrementAllocationMonitor(); // is ++allocationMonitor
} else if (allocationMonitor > 0) {
allocationMonitor = 0;
}
{code}
It seems we are trying to guess if we did a realloc rather than simply
incrementing the monitor when we actually do the reAlloc. It is also not clear
that the if-statement mimics the if that does the reAlloc.
In fact, a close inspection of the {{setValueCount()}} method, shows that it
does not even work correctly. It is allocating enough space to hold the value
count, which will implicitly set the unused slots to 0. But, this is an offset
vector; we should be copying forward the last set offset. Consider an example:
\[0, 3, 9] as set values.
Set row count as 5. Then, the offset count is (row count + 1) = 6. What we
create is:
\[0, 3, 9, 0, 0, 0]
What we should have set is:
\[0, 3, 9, 9, 9, 9]
It seems to be the responsibility of the caller to set the last value, so what
we actually set today, in this case, is:
\[0, 3, 9, 0, 0, 9]
The caveat is that the above works if the caller diligently fills in the
offsets row by row. But, if it die, there would be no reason to extend the
vector in {{setValueCount()}}. It has to be one way or the other, not the
ambiguous middle that we have in the code now.
was (Author: paul-rogers):
Consider the {{setValueCount()}} method again:
{code}
public void setValueCount(int valueCount) {
final int currentValueCapacity = getValueCapacity();
...
while(valueCount > getValueCapacity()) {
reAlloc();
}
if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
...
{code}
The {{if}} statement is very probably wrong. We get the current capacity. If
too small, we realloc. Then, we check the value, after the possible
reallocation, without updating the capacity. It is not clear if we are trying
to do check the before or after.
Then:
{code}
if (valueCount > 0 && currentValueCapacity > valueCount * 2) {
incrementAllocationMonitor(); // is ++allocationMonitor
} else if (allocationMonitor > 0) {
allocationMonitor = 0;
}
{code}
It seems we are trying to guess if we did a realloc rather than simply
incrementing the monitor when we actually do the reAlloc. It is also not clear
that the if-statement mimics the if that does the reAlloc.
In fact, a close inspection of the {{setValueCount()}} method, shows that it
does not even work correctly. It is allocating enough space to hold the value
count, which will implicitly set the unused slots to 0. But, this is an offset
vector; we should be copying forward the last set offset. Consider an example:
\[0, 3, 9] as set values.
Set row count as 5. What we create is:
\[0, 3, 9, 0, 0, 0]
What we should have set is:
\[0, 3, 9, 9, 9, 9]
It seems to be the responsibility of the caller to set the last value, so what
we actually set today, in this case, is:
\[0, 3, 9, 0, 0, 9]
The caveat is that the above works if the caller diligently fills in the
offsets row by row. But, if it die, there would be no reason to extend the
vector in {{setValueCount()}}. It has to be one way or the other, not the
ambiguous middle that we have in the code now.
> Useless code in VectorTrimmer, fixed-width vector setValueCount()
> -----------------------------------------------------------------
>
> Key: DRILL-5488
> URL: https://issues.apache.org/jira/browse/DRILL-5488
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Priority: Trivial
>
> Consider this code in a generated fixed-width vector, such as UInt4Vector:
> {code}
> @Override
> public void setValueCount(int valueCount) {
> ...
> final int idx = (VALUE_WIDTH * valueCount);
> ...
> VectorTrimmer.trim(data, idx);
> data.writerIndex(valueCount * VALUE_WIDTH);
> }
> {code}
> Consider the {{trim()}} method:
> {code}
> public class VectorTrimmer {
> ...
> public static void trim(ByteBuf data, int idx) {
> data.writerIndex(idx);
> if (data instanceof DrillBuf) {
> // data.capacity(idx);
> data.writerIndex(idx);
> }
> }
> }
> {code}
> This method is called {{trim}}, but it actually sets the writer index in the
> buffer (though we never use that index.) Since all buffers we use are
> {{DrillBuf}}, the if-statement is a no-op: we simply set the writer index
> twice.
> But, notice that the {{setValueCount()}} method itself calls the same
> {{writerIndex()}} method, so it is actually being called three times.
> It seems this code can simply be discarded: it is called from only two
> places; neither of which end up using the writer index.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)