alamb opened a new issue #157:
URL: https://github.com/apache/arrow-rs/issues/157


   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-11002
   
   The code that reads in nested lists in rust/arrow/src/json/reader.rs does an 
extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared 
to not cloning.
   
   The goal of this ticket would be to improve the performance of reading JSON 
in this case, likely by avoiding the clone
   
   More details can be found here: 
   
   https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
   
   As [~nevi_me] says:
   {quote}
    I suspect the main perf loss is from having to peek into JSON values in 
order to make the nesting work.
   By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a 
Vec<Value>, i.e. [_, _, _].
   By extracting values, we are able to then use the reader to read &[Value] 
without caring about its key (a).
   The downside of this approach is that we have to clone values to get 
Vec<Value>, as I couldn't find an alternative
   {quote}


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to