jorgecarleitao commented on pull request #8118:
URL: https://github.com/apache/arrow/pull/8118#issuecomment-687891730


   > Maybe I am misunderstanding, but I think there may be a flaw with this 
approach and we're not comparing apples with apples when looking at the 
benchmarks.
   > 
   > The original code is dynamically building a struct using the builder. The 
new code starts with a `vec!` where everything is known at compile time. In 
theory, the builders should be more efficient than building a `Vec` and then 
converting it.
   
   I though that `criterion::black_box()` would block the compiler from 
optimizing the code on it, so that the benchmark would not be tainted by 
compiler optimizations. I use these in both the Builder and `From`.
   
   Regardless, the reason I used this approach was because I looked through the 
code on where we use Builders, and I found two main inputs:
   
   * a vector:
       * constructed from reading batches of rows (e.g. `StringRecord` in CSV, 
`&[Value]` in json)
       * constructed in memory from some external source (e.g. `MemoryScan`)
   * an Arrow Array, in most in-memory calculations (e.g. `RecordBatch` and 
`ArrayRef`, in `compute` and DataFusion)
   
   In all cases, we use the builders to append rows row-by-row:
   * see 
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/csv/reader.rs#L432)
 for CSV
   * see 
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L491)
 for JSON
   * in parquet [we do not use Array 
builders](https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L27)
   * see 
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/cast.rs#L207)
 for an example in compute
   
   Based on this analysis, I though that:
   * this benchmark was a good representation of our use-cases
   * we can use `[Try]From` to build our results instead of a builder. The 
`from` is essentially `builder.append_data().finish()`, with a significantly 
simpler API
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to