rockyzhengwu opened a new issue, #3150:
URL: https://github.com/apache/arrow-rs/issues/3150
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
The implementation of decode json to arrow array need convert batch_size of
json str to serde_json Value .
this equires a lot of memory for serde_json Value. if with a big batch_size
will OOM , usually a large batch_size will have a good compression rate.
https://github.com/apache/arrow-rs/blob/e1b5657eb1206ce67eb079f6e72615982a70480a/arrow-json/src/reader.rs#L685
**Describe the solution you'd like**
current implementation in pseudocode:
``` rust
for batch in value_iter{
let mut rows: Vec<Value> = Vec::with_capacity(batch_size);
let arrays = convert_function(rows)
}
```
If convert ony one json str to serde_json Value will save 3x-5x memory or
more, i didn't record carefully .
I had implement a version in our online product in this way , because we use
a large batch_size . the pseudocde is
``` rust
let field_builder: Vec<Box<dyn ArrayBuilder>> =
create_array_builder(batch_size);
for (i, row) in value_iter.enumerate(){
let value = serde_json::from_str(row);
for (index, field) in shema.field.fields{
let col_name = field.name();
field_builder[i].append(value.get(col_name))
}
if i == batch_size{
let array_refs = builder.iter_mut().map(|builder|
builder.finish()).collect();
.....
}
}
```
this implementation didn't effect the performance.
But it didn't support deep nested list and map.
I'm not sure this is a elegant way for this. or it's possiable to support
deep nested list and map.
if this is a good idea , I can try to make PR for this .
**Describe alternatives you've considered**
<!--
A clear and concise description of any alternative solutions or features
you've considered.
-->
**Additional context**
<!--
Add any other context or screenshots about the feature request here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]