jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211754053
> > btw, there is probably a downside for using MapBuilder, since we need to create different builder for different types (therefore many macros) which easily causes code bloat. We have similar issue while building array function before #7988. I'm not sure what is the best approach yet (probably current draft function is the best), worth to explore it. > > Besides the code bloat, I'm also concerned about the performance. Because `MapBuilder` doesn't provide batch append (I think it's not its main purpose), we can only append data entry by entry. I might expect it to have O(n) complexity. > > I did some benchmarks to compare the `MapBuilder` version and the `MapArray::from()` version. https://github.com/goldmedal/datafusion/blob/95198c17eddd0a708e61e66076a383d5671646b7/datafusion/functions/src/core/map.rs#L132 > > I haven't tried any optimizations for them. The results are as follows: > > ```rust > // The first run is for warm-up > Generating 1 random key-value pair > Time elapsed in make_map() is: 449.737µs > Time elapsed in make_map_batch() is: 129.096µs > > Generating 10 random key-value pairs > Time elapsed in make_map() is: 24.932µs > Time elapsed in make_map_batch() is: 18.007µs > > Generating 50 random key-value pairs > Time elapsed in make_map() is: 34.587µs > Time elapsed in make_map_batch() is: 17.037µs > > Generating 100 random key-value pairs > Time elapsed in make_map() is: 66.02µs > Time elapsed in make_map_batch() is: 19.027µs > > Generating 500 random key-value pairs > Time elapsed in make_map() is: 586.476µs > Time elapsed in make_map_batch() is: 63.832µs > > Generating 1000 random key-value pairs > Time elapsed in make_map() is: 722.771µs > Time elapsed in make_map_batch() is: 47.455µs > ``` > > The scenario for this function may not be for large maps, but I think the `MapArray::from()` version always has better performance and won't cause code bloat. I prefer this version. > > What do you think? @alamb @jayzhan211 `make_map_batch` version is way faster, we should go with that one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
