[
https://issues.apache.org/jira/browse/ARROW-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Lamb updated ARROW-11002:
--------------------------------
Description:
The code that reads in nested lists in rust/arrow/src/json/reader.rs does an
extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared
to not cloning.
The goal of this ticket would be to improve the performance of reading JSON in
this case, likely by avoiding the clone
More details can be found here:
https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
As [~nevi_me] says:
{quote}
I suspect the main perf loss is from having to peek into JSON values in order
to make the nesting work.
By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a
Vec<Value>, i.e. [_, _, _].
By extracting values, we are able to then use the reader to read &[Value]
without caring about its key (a).
The downside of this approach is that we have to clone values to get
Vec<Value>, as I couldn't find an alternative
{quote}
was:
The code that reads in nested lists in rust/arrow/src/json/reader.rs does an
extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared
to not cloning.
The goal of this ticket would be to improve the performance of reading JSON in
this case, likely by avoiding the clone
More details can be found here:
https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
As [~nevi_me] says:
> I suspect the main perf loss is from having to peek into JSON values in order
> to make the nesting work.
> By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a
> Vec<Value>, i.e. [_, _, _].
> By extracting values, we are able to then use the reader to read &[Value]
> without caring about its key (a).
> The downside of this approach is that we have to clone values to get
> Vec<Value>, as I couldn't find an alternative
> [Rust] Improve speed of JSON nested list reader
> -----------------------------------------------
>
> Key: ARROW-11002
> URL: https://issues.apache.org/jira/browse/ARROW-11002
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Andrew Lamb
> Priority: Minor
>
> The code that reads in nested lists in rust/arrow/src/json/reader.rs does an
> extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark
> compared to not cloning.
> The goal of this ticket would be to improve the performance of reading JSON
> in this case, likely by avoiding the clone
> More details can be found here:
> https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
> As [~nevi_me] says:
> {quote}
> I suspect the main perf loss is from having to peek into JSON values in
> order to make the nesting work.
> By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a
> Vec<Value>, i.e. [_, _, _].
> By extracting values, we are able to then use the reader to read &[Value]
> without caring about its key (a).
> The downside of this approach is that we have to clone values to get
> Vec<Value>, as I couldn't find an alternative
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)