bionicles commented on issue #6522: URL: https://github.com/apache/arrow-rs/issues/6522#issuecomment-2635541751
well, you know how it is, a day or two of struggle writing new code saves 15-30 minutes reading the instructions i'm a major noob with arrow internals, and didn't know all that stuff existed, polars doesn't expose the JSON type @mbrobbel mentioned (which is perfect for my use case), and some dude instantly closed my issue about it with a one liner comment telling me to use struct, so I wrote functions to convert vectors of serde_json values into polars columns As far as I know, arrow json reader is for cases where you have a json for each row of a table; but I have a column of jsons and wanted to keep them all in a single column The biggest pain by an exponential margin was concatenating the `Box<dyn Array>`, because I didn't know how they worked, then assumed Offsets were cumulative, then didn't realize no need to push initial zero, then didn't realize I couldn't skip zeroeth array, and then realized had to try_push subarray lengths onto offsets instead of pushing offsets onto offsets, then various recursion bugs,  where  How do we concatenate arrays? My spaghetti works but I can't say I'm proud of it  Oh, of course now I see https://docs.rs/arrow/latest/arrow/compute/fn.concat.html Anyway, arrow is a cool library, I learned a lot, I've mostly avoided `Box<dyn T>` for performance, but `dyn` seems like a more extensible approach to type-heterogeneity than enums, and Arrow seems pretty performant Happy to post a gist, it's about 800 lines, i didn't include the imports, uses serde_json and polars, hope it helps, needs translation from polars to arrow and you can absolutely make a better version, totally agree json -> struct support is a good add https://gist.github.com/bionicles/f7dd0eac5d3ed44c919a3b7a5c44d285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
