alamb commented on issue #8083: URL: https://github.com/apache/arrow-rs/issues/8083#issuecomment-3165270664
Here are some good comments from @scovich related to this https://github.com/apache/arrow-rs/pull/8021#discussion_r2257559038 > Ah, I think I see -- for a strongly-typed builder, the caller could do all the pathing and just pass in the correct leaf column here. But this is variant, so we'd actually need to extract the variant leaf value (using a path) on a per-row basis. We'd either have to do all the pathing here (recursively), or caller would have to to extract the struct/array once and recurse on its fields. And the latter would require a row-oriented builder where this builder seems to be column-oriented. https://github.com/apache/arrow-rs/pull/8021#discussion_r2257671520 > Currently, the output builder seems to be fully column-oriented -- it assumes that all values for each leaf column are extracted in a tight loop. This can work for primitive builders, but nested builders will quickly run into pathing and efficiency problems. > > I think we'll need to do something similar to the JSON builder, with a row-oriented approach where each level of a nested builder receives an already-constructed Variant for the current row and does a field extract for each child builder; child builders can then cast the result directly or recurse further as needed (based on their own type). And then the top-level builder call would construct a Variant for each row to kick-start the process. > > But see the other comment -- to the extend that the shredding aligns nicely, we can hoist a subset of this per-row pathing of the append method up to columnar pathing of the builder's constructor and finalizer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org