rockyzhengwu opened a new issue, #3150:
URL: https://github.com/apache/arrow-rs/issues/3150

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The implementation of decode json to arrow array need convert batch_size of  
json str to serde_json Value . 
   this  equires a lot of memory for serde_json Value. if with a big batch_size 
will OOM , usually a large batch_size will have a good compression rate.
   
   
    
https://github.com/apache/arrow-rs/blob/e1b5657eb1206ce67eb079f6e72615982a70480a/arrow-json/src/reader.rs#L685
 
   
   **Describe the solution you'd like**
   current implementation in pseudocode: 
   ``` rust
   for batch in value_iter{
      let mut rows: Vec<Value> = Vec::with_capacity(batch_size);
      let arrays = convert_function(rows)
   }
   ```
   If convert ony one json str to serde_json Value will save 3x-5x memory or 
more, i didn't record carefully .  
   I had implement a version in our online product in this way , because we use 
a large batch_size . the pseudocde is 
   ``` rust
   let  field_builder:  Vec<Box<dyn ArrayBuilder>> = 
create_array_builder(batch_size);
   for (i, row) in value_iter.enumerate(){
       let value = serde_json::from_str(row);
        for (index, field) in shema.field.fields{
            let col_name = field.name();
            field_builder[i].append(value.get(col_name))
        }
      if i == batch_size{
         let array_refs = builder.iter_mut().map(|builder| 
builder.finish()).collect();
        .....
      }
   }
   
   ```
   this implementation didn't effect the performance. 
    But it didn't support deep nested list and  map. 
    I'm not sure this is a elegant way for this. or it's possiable to support 
deep nested list and map. 
   if this is a good idea ,  I can try to make PR for this . 
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to