[GitHub] [arrow] zeroshade commented on issue #34330: [GO] Parquet file handling extensions

via GitHub Fri, 24 Feb 2023 07:53:17 -0800


zeroshade commented on issue #34330:
URL: https://github.com/apache/arrow/issues/34330#issuecomment-1443883987


   @yevgenypats Some easy directions:
   
   * For the most part, nearly all direct arrow<-->parquet handling is kept 
solely to the `pqarrow` package (*nearly*) so most of the work will be there 
with the exception of dealing with reading via `file/record_reader.go`
   * You'll need to make sure to properly handle the case where the arrow 
schema gets stored in `pqarrow/schema.go` and make sure that the underlying 
storage type of the extension type matches the physical/logical type of the 
parquet file
   * The easiest way to handle extension types in *most* cases is to just throw 
something at the top of the type switch which checks for the extension ID and 
if it's an extension type, then set the local type to use the underlying 
storage type for example, look at `arrow/compute/internal/exec/span.go` at line 
170 or look at how the ipc handling handles extension types.
   * Pretty much all arrow writing is handled through the big type switch in 
`writeDenseArrow` in `pqarrow/encode_arrow.go` while reading is managed in 
`column_readers.go`
   * Don't forget to also handle extension types in `path_builder.go` which is 
how it determines the leaf arrays for writing. Honestly you can probably just 
search for `arrow.EXTENSION` in any file of `pqarrow/*` and you'll find most, 
if not all, the spots that need to be added as I tried to explicitly return a 
not implemented error in all cases for extension types rather than letting it 
go to the default in order to make it easier to find the spots later.
   
   Don't be afraid to put up a WIP PR if you need some assistance on it and 
I'll be happy to help out when I can.
   
   Thanks again!! I'm looking forward to the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] zeroshade commented on issue #34330: [GO] Parquet file handling extensions

Reply via email to