metalmatze opened a new issue, #35711: URL: https://github.com/apache/arrow/issues/35711
### Describe the enhancement requested We are currently rewriting some of our components to use Arrow and when appending data to the record, we want to basically deduplicate/aggregate the data before adding new rows. For that, we either need to keep track of the previously appended data outside of Arrow, or (and that's what we want to enhance) we read back the data from the underlying builders. I was able to find [several occurrences](https://github.com/search?q=repo%3Aapache%2Farrow+path%3A%2F%5Ego%5C%2Farrow%5C%2Farray%5C%2F%2F+Builder%29+Value%28i+int%29&type=code) of reading values from builders. Next, to potentially adding the `*Builder Value(i int) *` functions to [some types](https://gist.github.com/metalmatze/c2ed937e7508d4b940bbf26bef2dded2) we would mostly be interested in reading back values from a `*BinaryDictionaryBuilder`. I've managed to read the indices back from the builders, but then failed and stopped trying to read the values from the underlying `BinaryMemoTable`. It would be super helpful to read back those strings from the Dictionary's `BinaryMemoTable`. Is this something we can somehow add? Is this something within the scope of this library? ```diff diff --git a/go/arrow/array/dictionary.go b/go/arrow/array/dictionary.go index 2409e296c..622dece04 100644 --- a/go/arrow/array/dictionary.go +++ b/go/arrow/array/dictionary.go @@ -1208,6 +1209,54 @@ func (b *BinaryDictionaryBuilder) InsertStringDictValues(arr *String) (err error return } +func (b *BinaryDictionaryBuilder) GetValueIndex(i int) int { + switch b := b.idxBuilder.Builder.(type) { + case *Int64Builder: + return int(b.Value(i)) + case *Uint64Builder: + return int(b.Value(i)) + case *Float64Builder: + return int(b.Value(i)) + case *Int32Builder: + return int(b.Value(i)) + case *Uint32Builder: + return int(b.Value(i)) + case *Float32Builder: + return int(b.Value(i)) + case *Int16Builder: + return int(b.Value(i)) + case *Uint16Builder: + return int(b.Value(i)) + case *Int8Builder: + return int(b.Value(i)) + case *Uint8Builder: + return int(b.Value(i)) + case *TimestampBuilder: + return int(b.Value(i)) + case *Time32Builder: + return int(b.Value(i)) + case *Time64Builder: + return int(b.Value(i)) + case *Date32Builder: + return int(b.Value(i)) + case *Date64Builder: + return int(b.Value(i)) + case *DurationBuilder: + return int(b.Value(i)) + default: + return -1 + } +} + +func (b *BinaryDictionaryBuilder) Value(i int) []byte { + return []byte{} +} + +func (b *BinaryDictionaryBuilder) ValueStr(i int) string { + //b.memoTable.(*hashing.BinaryMemoTable). + return "" +} + type FixedSizeBinaryDictionaryBuilder struct { dictionaryBuilder byteWidth int ``` ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
