[GitHub] [arrow] freakyzoidberg opened a new issue, #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader

via GitHub Sat, 23 Sep 2023 07:23:07 -0700


freakyzoidberg opened a new issue, #37841:
URL: https://github.com/apache/arrow/issues/37841


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I am trying to decode  in Java records generated in Go (simple type + 
dictionaries) using ZSTD compression 
   
   Although this is working fine for the simple types, I am getting this error 
when decoding dictionaries
   
   ```
   java.lang.IllegalArgumentException: Please add arrow-compression module to 
use CommonsCompressionFactory for ZSTD
        at 
org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:69)
        at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:82)
        at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:256)
        at 
org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:247)
        at 
org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:167)
   ```
   
   
   The Go part is essentially
   
   ```go
   dtyp := &arrow.DictionaryType{
        IndexType: arrow.PrimitiveTypes.Int8,
        ValueType: arrow.BinaryTypes.LargeString,
   }
   bldrDictString := arrowarray.NewDictionaryBuilder(memory.DefaultAllocator, 
dtyp)
   defer bldrDictString.Release()
   
   bldrDictString.(*arrowarray.BinaryDictionaryBuilder).AppendString("foo")
   
   columnTypes := make([]arrow.Field, 0, 1)
   columnArrays := make([]arrow.Array, 0, 1)
   
   columnArrays = append(columnArrays, bldrDictString.NewArray())
   columnTypes = append(columnTypes, arrow.Field{Name: k.key, Type: dtyp, 
Nullable: nulls.Any()})
   
   schema := arrow.NewSchema(columnTypes, nil)
   rec := arrowarray.NewRecord(schema, columnArrays, int64(size))
   
   var buf bytes.Buffer
   writer := ipc.NewWriter(&buf, ipc.WithSchema(schema), ipc.WithZstd())
   err := writer.Write(rec)
   err = writer.Close()
   ```
   
   
   And the Java side
   
   ```java
   import org.apache.arrow.compression.CommonsCompressionFactory;
   
   
   try (ArrowStreamReader reader =
            new ArrowStreamReader(
                new ByteArrayInputStream(format.getArrow().toByteArray()),
                bufferAllocator,
                CommonsCompressionFactory.INSTANCE)) {
     reader.loadNextBatch();
     ...
   } catch (IOException e) {
     throw new RuntimeException(e);
   }
   ```
   
   
   I am able to get it to not throw by making the VectorLoader used when 
loading the dictionary use the compression factory defined in the reader (it is 
currently defaulting to NoCompression) 
   
   see this 
[change](https://github.com/freakyzoidberg/arrow/commit/f945d2ddee9c332661d3d97084c2aedb56f7fcf5),
 note I was not able to make it fail using the java arrow test.
   I am probably doing something wrong, and also wondering if dictionaries are 
compressed the same in go and java writers which could explain why the java 
test is not failing ?
   
   Anyhow, unless I am doing something wrong, this looks like a bug.
   
   Thanks !
   
   
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] freakyzoidberg opened a new issue, #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader

Reply via email to