[ 
https://issues.apache.org/jira/browse/ARROW-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-10174:
------------------------------------
    Fix Version/s:     (was: 2.0.0)
                   3.0.0

> [Java] Reading of Dictionary encoded struct vector fails 
> ---------------------------------------------------------
>
>                 Key: ARROW-10174
>                 URL: https://issues.apache.org/jira/browse/ARROW-10174
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 1.0.1
>            Reporter: Benjamin Wilhelm
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Write an index vector and a dictionary with a dictionary vector of the type 
> {{Struct}} using an {{ArrowStreamWriter}}. Reading this again fails with an 
> exception.
> Code to reproduce:
> {code:java}
> final RootAllocator allocator = new RootAllocator();
> // Create the dictionary
> final StructVector dict = StructVector.empty("Dict", allocator);
> final NullableStructWriter dictWriter = dict.getWriter();
> final IntWriter dictA = dictWriter.integer("a");
> final IntWriter dictB = dictWriter.integer("b");
> for (int i = 0; i < 3; i++) {
>       dictWriter.start();
>       dictA.writeInt(i);
>       dictB.writeInt(i);
>       dictWriter.end();
> }
> dict.setValueCount(3);
> final Dictionary dictionary = new Dictionary(dict, new DictionaryEncoding(1, 
> false, null));
> // Create the vector
> final Random random = new Random();
> final StructVector vector = StructVector.empty("Dict", allocator);
> final NullableStructWriter vectorWriter = vector.getWriter();
> final IntWriter vectorA = vectorWriter.integer("a");
> final IntWriter vectorB = vectorWriter.integer("b");
> for (int i = 0; i < 10; i++) {
>       int v = random.nextInt(3);
>       vectorWriter.start();
>       vectorA.writeInt(v);
>       vectorB.writeInt(v);
>       vectorWriter.end();
> }
> vector.setValueCount(10);
> // Encode the vector using the dictionary
> final IntVector indexVector = (IntVector) DictionaryEncoder.encode(vector, 
> dictionary);
> // Write the vector to out
> final ByteArrayOutputStream out = new ByteArrayOutputStream();
> final VectorSchemaRoot root = new 
> VectorSchemaRoot(Collections.singletonList(indexVector.getField()),
>               Collections.singletonList(indexVector));
> final ArrowStreamWriter writer = new ArrowStreamWriter(root, new 
> MapDictionaryProvider(dictionary),
>               Channels.newChannel(out));
> writer.start();
> writer.writeBatch();
> writer.end();
> // Read the vector from out
> try (final ArrowStreamReader reader = new ArrowStreamReader(new 
> ByteArrayInputStream(out.toByteArray()),
>               allocator)) {
>       reader.loadNextBatch();
>       final VectorSchemaRoot readRoot = reader.getVectorSchemaRoot();
>       final FieldVector readIndexVector = readRoot.getVector(0);
>       // Get the dictionary and decode
>       final Map<Long, Dictionary> readDictionaryMap = 
> reader.getDictionaryVectors();
>       final Dictionary readDictionary = 
> readDictionaryMap.get(readIndexVector.getField().getDictionary().getId());
>       final ValueVector readVector = 
> DictionaryEncoder.decode(readIndexVector, readDictionary);
> }
> {code}
> Exception:
> {code}
> java.lang.IllegalArgumentException: not all nodes and buffers were consumed. 
> nodes: [ArrowFieldNode [length=3, nullCount=0], ArrowFieldNode [length=3, 
> nullCount=0]] buffers: [ArrowBuf[21], address:140118352739688, length:1, 
> ArrowBuf[22], address:140118352739696, length:12, ArrowBuf[23], 
> address:140118352739712, length:1, ArrowBuf[24], address:140118352739720, 
> length:12]
>       at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:63)
>       at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:241)
>       at 
> org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:232)
>       at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:129)
>       at com.knime.AppTest.testDictionaryStruct(AppTest.java:83)
> {code}
> If I see it corretly the error happens in 
> {{DictionaryUtilities#toMessageFormat}}. If a dictionary encoded vector is 
> encountered still the children of the memory format field are used (none 
> because this is Int). However, the children of the field of the dictionary 
> vector should be mapped to the message format and set as children.
> I can create a fix and open a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to