zeroshade commented on code in PR #850:
URL: https://github.com/apache/arrow-go/pull/850#discussion_r3390422692
##########
arrow/avro/reader_types.go:
##########
@@ -594,6 +594,10 @@ func appendBinaryData(b *array.BinaryBuilder, data
interface{}) {
switch dt := data.(type) {
case nil:
b.AppendNull()
+ case []byte:
+ b.Append(dt)
Review Comment:
These explicit `[]byte`/`string` cases are the right fix. Forward-looking
nit: the `default:` branch a few lines below in this function —
`b.Append(fmt.Append([]byte{}, data))` — is the exact silent fallback that
produced the corrupted text this PR fixes. Now that
`nil`/`[]byte`/`string`/`map[string]any` are all handled it's unreachable for
hamba's real outputs, so this is a good opportunity to turn it (and the
identical `b.Append(fmt.Sprint(data))` default in `appendStringData`) into a
hard error/panic on the unexpected `%T`. That way the next decoder/type
mismatch fails loudly instead of silently corrupting a column. Non-blocking.
##########
arrow/avro/reader_types.go:
##########
@@ -594,6 +594,10 @@ func appendBinaryData(b *array.BinaryBuilder, data
interface{}) {
switch dt := data.(type) {
case nil:
b.AppendNull()
+ case []byte:
+ b.Append(dt)
+ case string:
+ b.Append([]byte(dt))
case map[string]any:
switch ct := dt["bytes"].(type) {
Review Comment:
Pre-existing (not introduced by this PR) but right next to the change: the
`default` of this `dt["bytes"]` switch does `b.Append(ct.([]byte))`, an
unchecked type assertion that will panic if the union's `bytes` value isn't a
`[]byte`. `appendFixedSizeBinaryData` handles the same shape more defensively
with a typed `case []byte: b.Append(v)` and lets anything else fall through.
Might be worth mirroring that here while you're in the file. Optional.
##########
arrow/avro/reader_types.go:
##########
@@ -594,6 +594,10 @@ func appendBinaryData(b *array.BinaryBuilder, data
interface{}) {
switch dt := data.(type) {
case nil:
b.AppendNull()
+ case []byte:
+ b.Append(dt)
+ case string:
+ b.Append([]byte(dt))
Review Comment:
Optional: per hamba's decoder a `bytes` field (and a nullable
`["null","bytes"]` union) decodes into `any` as a bare `[]byte`, never a Go
`string`, so this `case string` is effectively dead on the normal decode path.
It's harmless as defensive code, but note the `appendFixedSizeBinaryData`
you're mirroring has no `string` case — for consistency you could drop it. Your
call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]