bionicles commented on issue #6522:
URL: https://github.com/apache/arrow-rs/issues/6522#issuecomment-2635541751

   well, you know how it is, a day or two of struggle writing new code saves 
15-30 minutes reading the instructions
   
   i'm a major noob with arrow internals, and didn't know all that stuff 
existed, polars doesn't expose the JSON type @mbrobbel mentioned (which is 
perfect for my use case), and some dude instantly closed my issue about it with 
a one liner comment telling me to use struct, so I wrote functions to convert 
vectors of serde_json values into polars columns
   
   As far as I know, arrow json reader is for cases where you have a json for 
each row of a table; but I have a column of jsons and wanted to keep them all 
in a single column
   
   The biggest pain by an exponential margin was concatenating the `Box<dyn 
Array>`, because I didn't know how they worked, then assumed Offsets were 
cumulative, then didn't realize no need to push initial zero, then didn't 
realize I couldn't skip zeroeth array, and then realized had to try_push 
subarray lengths onto offsets instead of pushing offsets onto offsets, then 
various recursion bugs, 
   
   
![Image](https://github.com/user-attachments/assets/fd8cfa79-2e01-43ef-b221-b83f1f55c4fc)
   where
   
   
![Image](https://github.com/user-attachments/assets/d966cb76-3d7a-47c0-b8b6-e1213a7168bc)
   
   How do we concatenate arrays? My spaghetti works but I can't say I'm proud 
of it
   
   
![Image](https://github.com/user-attachments/assets/efe962db-4a9f-467b-be54-87a104f1e582)
   
   Oh, of course now I see 
https://docs.rs/arrow/latest/arrow/compute/fn.concat.html
   
   Anyway, arrow is a cool library, I learned a lot, I've mostly avoided 
`Box<dyn T>` for performance, but `dyn` seems like a more extensible approach 
to type-heterogeneity than enums, and Arrow seems pretty performant
   
   Happy to post a gist, it's about 800 lines, i didn't include the imports, 
uses serde_json and polars, hope it helps, needs translation from polars to 
arrow and you can absolutely make a better version, totally agree json -> 
struct support is a good add
   
   https://gist.github.com/bionicles/f7dd0eac5d3ed44c919a3b7a5c44d285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to