jorgecarleitao edited a comment on pull request #8118:
URL: https://github.com/apache/arrow/pull/8118#issuecomment-687891730
> Maybe I am misunderstanding, but I think there may be a flaw with this
approach and we're not comparing apples with apples when looking at the
benchmarks.
>
> The original code is dynamically building a struct using the builder. The
new code starts with a `vec!` where everything is known at compile time. In
theory, the builders should be more efficient than building a `Vec` and then
converting it.
I though that `criterion::black_box()` would block the compiler from
optimizing the code on it, so that the benchmark would not be tainted by
compiler optimizations. I use these in both the Builder and `From`.
Regardless, the reason I used this approach was because I looked through the
code on where we use Builders, and I found two main inputs:
* a vector:
* constructed from reading batches of rows (e.g. `StringRecord` in CSV,
`&[Value]` in json)
* constructed in memory from some external source (e.g. `MemoryScan`)
* an Arrow Array, in most in-memory calculations (e.g. `RecordBatch` and
`ArrayRef`, in `compute` and DataFusion)
In all cases, we use the builders to append rows row-by-row:
* see
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/csv/reader.rs#L432)
for CSV
* see
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L491)
for JSON
* in parquet [we do not use Array
builders](https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L27)
* see
[here](https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/cast.rs#L207)
for an example in compute
Based on this analysis, I though that:
* this benchmark was a good representation of our use-cases
* we can use `[Try]From` to build our results instead of a builder. The
`from` is essentially `builder.append_many().finish()`, with a significantly
simpler API
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]