emkornfield commented on a change in pull request #7175:
URL: https://github.com/apache/arrow/pull/7175#discussion_r426996139
##########
File path: cpp/src/parquet/arrow/reader_writer_benchmark.cc
##########
@@ -95,15 +97,37 @@ void SetBytesProcessed(::benchmark::State& state) {
state.SetBytesProcessed(bytes_processed);
}
+constexpr int64_t kAlternatingOrNa = -1;
+
+template <typename T>
+std::vector<T> RandomVector(int64_t true_percentage, int64_t vector_size,
+ const std::array<T, 2>& sample_values) {
+ std::vector<T> values(BENCHMARK_SIZE, {});
+ if (true_percentage == kAlternatingOrNa) {
+ int n = {0};
+ std::generate(values.begin(), values.end(), [&n] { return n++ % 2; });
+ } else {
+ std::default_random_engine rng(500);
+ double true_probability = static_cast<double>(true_percentage) / 100.0;
+ std::bernoulli_distribution dist(true_probability);
+ std::generate(values.begin(), values.end(), [&] { return
sample_values[dist(rng)]; });
+ }
+ return values;
+}
+
template <typename ParquetType>
std::shared_ptr<::arrow::Table> TableFromVector(
- const std::vector<typename ParquetType::c_type>& vec, bool nullable) {
+ const std::vector<typename ParquetType::c_type>& vec, bool nullable,
+ int64_t null_percentage = kAlternatingOrNa) {
+ if (!nullable) {
+ DCHECK(null_percentage = kAlternatingOrNa);
+ }
std::shared_ptr<::arrow::DataType> type =
std::make_shared<ArrowType<ParquetType>>();
NumericBuilder<ArrowType<ParquetType>> builder;
if (nullable) {
- std::vector<uint8_t> valid_bytes(BENCHMARK_SIZE, 0);
- int n = {0};
- std::generate(valid_bytes.begin(), valid_bytes.end(), [&n] { return n++ %
2; });
+ // Note true values select index 1 of sample_values
+ auto valid_bytes =
RandomVector<uint8_t>(/*true_percengate=*/null_percentage,
+ BENCHMARK_SIZE,
/*sample_values=*/{1, 0});
Review comment:
I do not think that is
[true](https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv4N5arrow14NumericBuilder12AppendValuesEPK10value_type7int64_tPK7uint8_t)
it is a confusing contract (maybe taking bool* would be better?) but I read
this as converting 1 and 0 to corresponding bits (Under the covers if I traced
correctly this calls
[`ArrayBuilder::UnsafeAppendToBitmap`](https://github.com/apache/arrow/blob/2849b643883793347ab87e0770aba4ccbda34c90/cpp/src/arrow/array/builder_primitive.cc#L96)
which ultimately calls
[GenerateBitsUnrolled](https://github.com/apache/arrow/blob/b4bd0d869dfa398dcaa65a1f4f6f13015d7fe8c4/cpp/src/arrow/buffer_builder.h#L305)
which [coverts bytes to
bits](https://github.com/apache/arrow/blob/bf722a01eebb42e5d49450dd6695469bac99ffcd/cpp/src/arrow/util/bit_util.h#L687)
)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]