metalmatze opened a new issue, #35711:
URL: https://github.com/apache/arrow/issues/35711

   ### Describe the enhancement requested
   
   We are currently rewriting some of our components to use Arrow and when 
appending data to the record, we want to basically deduplicate/aggregate the 
data before adding new rows. For that, we either need to keep track of the 
previously appended data outside of Arrow, or (and that's what we want to 
enhance) we read back the data from the underlying builders. 
   
   I was able to find [several 
occurrences](https://github.com/search?q=repo%3Aapache%2Farrow+path%3A%2F%5Ego%5C%2Farrow%5C%2Farray%5C%2F%2F+Builder%29+Value%28i+int%29&type=code)
 of reading values from builders. 
   
   Next, to potentially adding the `*Builder Value(i int) *` functions to [some 
types](https://gist.github.com/metalmatze/c2ed937e7508d4b940bbf26bef2dded2) we 
would mostly be interested in reading back values from a 
`*BinaryDictionaryBuilder`. 
   I've managed to read the indices back from the builders, but then failed and 
stopped trying to read the values from the underlying `BinaryMemoTable`. 
   
   It would be super helpful to read back those strings from the Dictionary's 
`BinaryMemoTable`. Is this something we can somehow add? Is this something 
within the scope of this library? 
   
   
   ```diff
   diff --git a/go/arrow/array/dictionary.go b/go/arrow/array/dictionary.go
   index 2409e296c..622dece04 100644
   --- a/go/arrow/array/dictionary.go
   +++ b/go/arrow/array/dictionary.go
   
   @@ -1208,6 +1209,54 @@ func (b *BinaryDictionaryBuilder) 
InsertStringDictValues(arr *String) (err error
        return
    }
    
   +func (b *BinaryDictionaryBuilder) GetValueIndex(i int) int {
   +    switch b := b.idxBuilder.Builder.(type) {
   +    case *Int64Builder:
   +            return int(b.Value(i))
   +    case *Uint64Builder:
   +            return int(b.Value(i))
   +    case *Float64Builder:
   +            return int(b.Value(i))
   +    case *Int32Builder:
   +            return int(b.Value(i))
   +    case *Uint32Builder:
   +            return int(b.Value(i))
   +    case *Float32Builder:
   +            return int(b.Value(i))
   +    case *Int16Builder:
   +            return int(b.Value(i))
   +    case *Uint16Builder:
   +            return int(b.Value(i))
   +    case *Int8Builder:
   +            return int(b.Value(i))
   +    case *Uint8Builder:
   +            return int(b.Value(i))
   +    case *TimestampBuilder:
   +            return int(b.Value(i))
   +    case *Time32Builder:
   +            return int(b.Value(i))
   +    case *Time64Builder:
   +            return int(b.Value(i))
   +    case *Date32Builder:
   +            return int(b.Value(i))
   +    case *Date64Builder:
   +            return int(b.Value(i))
   +    case *DurationBuilder:
   +            return int(b.Value(i))
   +    default:
   +            return -1
   +    }
   +}
   +
   +func (b *BinaryDictionaryBuilder) Value(i int) []byte {
   +    return []byte{}
   +}
   +
   +func (b *BinaryDictionaryBuilder) ValueStr(i int) string {
   +    //b.memoTable.(*hashing.BinaryMemoTable).
   +    return ""
   +}
   +
    type FixedSizeBinaryDictionaryBuilder struct {
        dictionaryBuilder
        byteWidth int
   ```
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to