Benjamin Wilhelm created ARROW-10174:
----------------------------------------

             Summary: [Java] Reading of Dictionary encoded struct vector fails 
                 Key: ARROW-10174
                 URL: https://issues.apache.org/jira/browse/ARROW-10174
             Project: Apache Arrow
          Issue Type: Bug
          Components: Java
    Affects Versions: 1.0.1
            Reporter: Benjamin Wilhelm


Write an index vector and a dictionary with a dictionary vector of the type 
{{Struct}} using an {{ArrowStreamWriter}}. Reading this again fails with an 
exception.

Code to reproduce:

{code:java}
final RootAllocator allocator = new RootAllocator();

// Create the dictionary
final StructVector dict = StructVector.empty("Dict", allocator);
final NullableStructWriter dictWriter = dict.getWriter();
final IntWriter dictA = dictWriter.integer("a");
final IntWriter dictB = dictWriter.integer("b");
for (int i = 0; i < 3; i++) {
        dictWriter.start();
        dictA.writeInt(i);
        dictB.writeInt(i);
        dictWriter.end();
}
dict.setValueCount(3);
final Dictionary dictionary = new Dictionary(dict, new DictionaryEncoding(1, 
false, null));

// Create the vector
final Random random = new Random();
final StructVector vector = StructVector.empty("Dict", allocator);
final NullableStructWriter vectorWriter = vector.getWriter();
final IntWriter vectorA = vectorWriter.integer("a");
final IntWriter vectorB = vectorWriter.integer("b");
for (int i = 0; i < 10; i++) {
        int v = random.nextInt(3);
        vectorWriter.start();
        vectorA.writeInt(v);
        vectorB.writeInt(v);
        vectorWriter.end();
}
vector.setValueCount(10);

// Encode the vector using the dictionary
final IntVector indexVector = (IntVector) DictionaryEncoder.encode(vector, 
dictionary);

// Write the vector to out
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final VectorSchemaRoot root = new 
VectorSchemaRoot(Collections.singletonList(indexVector.getField()),
                Collections.singletonList(indexVector));
final ArrowStreamWriter writer = new ArrowStreamWriter(root, new 
MapDictionaryProvider(dictionary),
                Channels.newChannel(out));
writer.start();
writer.writeBatch();
writer.end();

// Read the vector from out
try (final ArrowStreamReader reader = new ArrowStreamReader(new 
ByteArrayInputStream(out.toByteArray()),
                allocator)) {
        reader.loadNextBatch();
        final VectorSchemaRoot readRoot = reader.getVectorSchemaRoot();
        final FieldVector readIndexVector = readRoot.getVector(0);

        // Get the dictionary and decode
        final Map<Long, Dictionary> readDictionaryMap = 
reader.getDictionaryVectors();
        final Dictionary readDictionary = 
readDictionaryMap.get(readIndexVector.getField().getDictionary().getId());
        final ValueVector readVector = 
DictionaryEncoder.decode(readIndexVector, readDictionary);
}
{code}

Exception:
{code}
java.lang.IllegalArgumentException: not all nodes and buffers were consumed. 
nodes: [ArrowFieldNode [length=3, nullCount=0], ArrowFieldNode [length=3, 
nullCount=0]] buffers: [ArrowBuf[21], address:140118352739688, length:1, 
ArrowBuf[22], address:140118352739696, length:12, ArrowBuf[23], 
address:140118352739712, length:1, ArrowBuf[24], address:140118352739720, 
length:12]
        at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:63)
        at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:241)
        at 
org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:232)
        at 
org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:129)
        at com.knime.AppTest.testDictionaryStruct(AppTest.java:83)
{code}

If I see it corretly the error happens in 
{{DictionaryUtilities#toMessageFormat}}. If a dictionary encoded vector is 
encountered still the children of the memory format field are used (none 
because this is Int). However, the children of the field of the dictionary 
vector should be mapped to the message format and set as children.

I can create a fix and open a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to