bu2 opened a new pull request #9022:
URL: https://github.com/apache/arrow/pull/9022
Purpose a "replace" compute kernel which could fulfil ARROW-10641 - [C++] A
"replace" or "map" kernel to replace values in array based on mapping
(@jorisvandenbossche). The implementation is inspired by "fill_null" kernel
except it takes an additional BooleanArray parameter which is used as a mask to
trigger value replacement.
**WARNING:** the current implementation expects all null values to be
replaced in the output (corresponding bit set to 1 in input mask) because it
will not carry nulls into the output (feel free to share your thoughts on the
current implementation and to give me hints on the easiest way to deal with
nulls that should make it in the output).
Add a "is_nan" kernel to check for NaN "equality" for FloatArray and
DoubleArray (based on std::isnan()). The kernel signature is based on "is_null"
kernel so I put my code in arrow/compute/kernels/scalar_validity.cc... but the
implementation take some inspiration from "compare" kernel.
Both kernels are used to mimic pandas.DataFrame.fillna(value=X) in C++. See
below an example of usage:
`
template <typename value_type>
std::shared_ptr<DataFrame> DataFrame::fillna(value_type value) {
auto outdf = std::make_shared<DataFrame>();
if (outdf->table_->num_rows() == 0)
outdf->table_ = arrow::Table::Make(
std::make_shared<arrow::Schema>(std::vector<std::shared_ptr<arrow::Field>>()),
std::vector<std::shared_ptr<arrow::ChunkedArray>>(),
this->table_->num_rows());
for (int i = 0 ; i < this->table_->num_columns() ; ++i) {
if (this->table_->ColumnNames()[i] == INDEX_COLUMN) {
auto field = this->table_->schema()->field(i);
auto chunked_array = this->table_->column(i);
outdf->table_ = outdf->table_->AddColumn(i, field,
chunked_array).ValueOrDie();
} else {
auto field = this->table_->schema()->field(i);
auto chunked_array = this->table_->column(i);
auto value_datum = arrow::compute::Cast(arrow::Datum(value),
chunked_array->type()).ValueOrDie();
auto nulls =
arrow::compute::IsNull(chunked_array).ValueOrDie();
auto nans =
**arrow::compute::IsNan(chunked_array)**.ValueOrDie().chunked_array();
auto to_replace = arrow::compute::Or(nulls,
nans).ValueOrDie();
auto clean_chunked_array =
**arrow::compute::Replace(chunked_array, nans,
value_datum)**.ValueOrDie().chunked_array();
outdf->table_ = outdf->table_->AddColumn(i, field,
clean_chunked_array).ValueOrDie();
}
}
return outdf;
}
`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]