wotbrew opened a new issue, #38242:
URL: https://github.com/apache/arrow/issues/38242

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `DenseUnionVector.getBufferSizeFor` takes a count parameter. My expectation 
is the count represents the number of elements in the union you wish to account 
for. 
   
   However, that count is passed directly to `internalStruct.getBufferSizeFor`, 
which I suspect is a bug.
   
   This is because normally struct fields have the same valueCount, but this is 
not likely to be true when used in a union. If your legs have different lengths:
   
   - for fixed vectors, you will calculate the size of the leg contents 
incorrectly
   - dynamic vectors like VarBinary may require reading buffer contents to find 
the data size, potentially causing an out-of-bounds dereference
   
   Observed on `13.0.0` and `12.0.1`.
   
   Repro:
   
   ```java
   package xtdb.util
   
   import org.apache.arrow.memory.BoundsChecking;
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.vector.BaseValueVector;
   import org.apache.arrow.vector.complex.DenseUnionVector;
   import org.apache.arrow.vector.types.UnionMode;
   import org.apache.arrow.vector.types.pojo.ArrowType;
   import org.apache.arrow.vector.types.pojo.Field;
   import org.apache.arrow.vector.types.pojo.FieldType;
   import org.junit.jupiter.api.Test;
   
   import java.util.Arrays;
   
   import static org.junit.jupiter.api.Assertions.*;
   
   public class DUVBufferSizeTest {
       @Test
       public void testBufferSize() {
           try (var allocator = new RootAllocator();
                var duv = new DenseUnionVector("duv", allocator, 
FieldType.nullable(new ArrowType.Union(UnionMode.Dense, null)), null)) {
   
               var fields = Arrays.asList(
                       new Field("a", FieldType.notNullable(new 
ArrowType.Int(32, true)), null),
                       new Field("b", FieldType.notNullable(new 
ArrowType.Binary()), null)
               );
   
               duv.initializeChildrenFromFields(fields);
   
               byte atid = 0;
               byte btid = 1;
   
               var a = duv.getIntVector(atid);
               var b = duv.getVarBinaryVector(btid);
   
               int ac = BaseValueVector.INITIAL_VALUE_ALLOCATION+1;
               for (int i = 0; i < ac; i++) {
                   a.setSafe(i, 1);
                   duv.setTypeId(i, atid);
                   duv.setOffset(i, i);
               }
   
               int bc = 1;
               for (int i = 0; i < bc; i++) {
                   b.setSafe(i, new byte[0]);
                   duv.setTypeId(i+ac, btid);
                   duv.setOffset(i+ac, i);
               }
   
               duv.setValueCount(ac+bc);
   
               // will not necessarily see an error unless bounds checking is 
on.
               assertTrue(BoundsChecking.BOUNDS_CHECKING_ENABLED);
               assertDoesNotThrow(duv::getBufferSize);
           }
       }
   }
   ```
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to