zeroshade commented on code in PR #34631:
URL: https://github.com/apache/arrow/pull/34631#discussion_r1157654714


##########
go/parquet/pqarrow/column_readers.go:
##########
@@ -459,8 +459,13 @@ func chunksToSingle(chunked *arrow.Chunked) 
(arrow.ArrayData, error) {
 
 // create a chunked arrow array from the raw record data
 func transferColumnData(rdr file.RecordReader, valueType arrow.DataType, descr 
*schema.Column, mem memory.Allocator) (*arrow.Chunked, error) {
+       valueID := valueType.ID()
+       if valueID == arrow.EXTENSION {
+               valueID = valueType.(arrow.ExtensionType).StorageType().ID()
+       }

Review Comment:
   you probably want to do:
   
   ```go
   dt := valueType
   if valueType.ID() == arrow.EXTENSION {
       dt = valueType.(arrow.ExtensionType).StorageType()
   }
   
   // ...
   switch dt.ID() {
   ...
   }
   ```
   
   and then use `dt` everywhere `valueType` is used except for when arrays and 
such are created.  This is because, for example, if you look at 
`transferDictionary` it assumes that the data type passed is a 
`DictionaryType`, which would be incorrect if `valueType` was an extension type 
whose underlying storage type was a dictionary. You'd want to pass `dt` to 
`transferDictionary` and then update the returned `Chunked` array's data type 
to be the extension type (or something like that....).
       



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to