chufucun edited a comment on pull request #185:
URL: https://github.com/apache/datasketches-cpp/pull/185#issuecomment-768986515
> It seems that this change breaks something in Python wrapper.
> I never liked this mixed union functionality. In my view it is just
unnecessary complexity. That is why I did not implement it in C++.
> There is a complication: we have a new implementation of Theta sketch,
which was developed as a part of Tuple sketch effort. The goal was to have a
common base for both Theta and Tuple sketches. Currently this new Theta sketch
implementation is sitting under Tuple sketch as experimental. I think it is
time to make the move: replace the existing Theta sketch with this new
implementation. I think it would be better to make sure that your integration
works with this new version.
> Perhaps we could add these update methods to the Theta and Tuple unions,
but I still think that they are not necessary.
thx reply.
Use udaf function to integrate theta into impala. This function mainly
affects the function logic of the merge phase.
I found a solution, but the code doesn't look very elegant, I will test the
performance and reply again。
**Merge function:**
The `dst` parameter is used to store intermediate results and is initialized
in the init phase
1. use update_theta_sketch and theta_union
```c++
void AggregateFunctions::DsThetaMerge(
FunctionContext* ctx, const StringVal& src, StringVal* dst) {
DCHECK(!src.is_null);
DCHECK(!dst->is_null);
DCHECK(dst->len == sizeof(datasketches::update_theta_sketch)
or dst->len == sizeof(datasketches::compact_theta_sketch));
auto src_sketch = datasketches::theta_sketch::deserialize((void*)src.ptr,
src.len);
if (src_sketch->is_empty()) return;
auto dst_sketch_ptr =
reinterpret_cast<datasketches::theta_sketch*>(dst->ptr);
datasketches::theta_union union_sketch =
datasketches::theta_union::builder().build();
union_sketch.update(*src_sketch);
union_sketch.update(*dst_sketch_ptr);
// Note, union_result may not match the type of dst and need to reallocate
space
const datasketches::compact_theta_sketch union_result =
union_sketch.get_result();
ctx->Reallocate(dst->ptr, sizeof(datasketches::compact_theta_sketch));
if (LIKELY(dst->ptr != NULL)) {
memset(dst->ptr, 0, sizeof(datasketches::compact_theta_sketch));
datasketches::compact_theta_sketch* sketch_ptr =
reinterpret_cast<datasketches::compact_theta_sketch*>(dst->ptr);
std::uninitialized_fill_n(sketch_ptr, 1, union_result);
} else {
DCHECK(!ctx->impl()->state()->GetQueryStatus().ok());
}
}
```
2. use theta_union(need add new update functionality)
```c++
void AggregateFunctions::DsThetaMerge(
FunctionContext* ctx, const StringVal& src, StringVal* dst) {
DCHECK(!src.is_null);
DCHECK(!dst->is_null);
DCHECK_EQ(dst->len, sizeof(datasketches::theta_union));
// Note, 'src' is a serialized compact_theta_sketch and not a serialized
theta_union.
auto src_sketch =
datasketches::compact_theta_sketch::deserialize((void*)src.ptr,
src.len);
datasketches::theta_union* dst_union_ptr =
reinterpret_cast<datasketches::theta_union*>(dst->ptr);
dst_union_ptr->update(src_sketch);
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]