nandorKollar commented on PR #13880:
URL: https://github.com/apache/iceberg/pull/13880#issuecomment-3228534662

   > > Why did they type of the vector change from IntVectors to 
BaseVarWidthVectors?
   > 
   > The vector changes because Dictionary encoded pages are a sequence of 
ints, {1, 2, 3, 4} that refer to entries in the Dictionary which maps the int 
to the actual column value. {1: "foo", 2: "bar", ....}. Other pages have 
literal representations of the values stored as binary {foo, bar, bazz }. So 
you have to switch vector types when you alternate.
   > 
   > > If we clear out "this.vec" if it is set, wouldn't this type change in 
the vector cause problems? Shouldn't we explicitly close the `this.vec` if it 
is not null, before setting it to a new vector?
   > 
   > No. To be clear, the code has _always_ cleared out this.vec and we dont' 
have correctness issues because essentially what is happening is:
   > 
   > 1. Reader looks to see if it can read the page
   > 2. If it can't re-use the container do an allocate for the correct 
container
   > 
   > What is missing here is 2.a If I previously had a container but it cannot 
be re-used, clear it
   
   Thanks for clarifying why the type change happens, makes sense. We can't 
reuse the vector, only when there's a switch from/to dictionary encoded pages, 
right? When you mention, that it is always cleared, you mean the the value 
count is set to 0 in this block:
   ```
       if (reuse == null
           || (!dictEncoded && readType == ReadType.DICTIONARY)
           || (dictEncoded && readType != ReadType.DICTIONARY)) {
         allocateFieldVector(dictEncoded);
         nullabilityHolder = new NullabilityHolder(batchSize);
       } else {
         vec.setValueCount(0);
         nullabilityHolder.reset();
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to