chufucun commented on pull request #185:
URL: https://github.com/apache/datasketches-cpp/pull/185#issuecomment-768986515


   > It seems that this change breaks something in Python wrapper.
   > I never liked this mixed union functionality. In my view it is just 
unnecessary complexity. That is why I did not implement it in C++.
   > There is a complication: we have a new implementation of Theta sketch, 
which was developed as a part of Tuple sketch effort. The goal was to have a 
common base for both Theta and Tuple sketches. Currently this new Theta sketch 
implementation is sitting under Tuple sketch as experimental. I think it is 
time to make the move: replace the existing Theta sketch with this new 
implementation. I think it would be better to make sure that your integration 
works with this new version.
   > Perhaps we could add these update methods to the Theta and Tuple unions, 
but I still think that they are not necessary.
   
   thx reply.
   Use udaf function to integrate theta into impala. This function mainly 
affects the function logic of the merge phase.
   I found a solution, but the code doesn't look very elegant, I will test the 
performance and reply again。
   
   **Merge function:**
   The `dst` parameter is used to store intermediate results and is initialized 
in the init phase
   
   1. use update_theta_sketch and theta_union
   ```c++
   void AggregateFunctions::DsThetaMerge(
       FunctionContext* ctx, const StringVal& src, StringVal* dst) {
     DCHECK(!src.is_null);
     DCHECK(!dst->is_null);
     DCHECK(dst->len == sizeof(datasketches::update_theta_sketch)
         or dst->len == sizeof(datasketches::compact_theta_sketch));
     auto src_sketch = datasketches::theta_sketch::deserialize((void*)src.ptr, 
src.len);
     if (src_sketch->is_empty()) return;
   
     auto dst_sketch_ptr = 
reinterpret_cast<datasketches::theta_sketch*>(dst->ptr);
   
     datasketches::theta_union union_sketch = 
datasketches::theta_union::builder().build();
     union_sketch.update(*src_sketch);
     union_sketch.update(*dst_sketch_ptr);
   
     // Note, Union_result may be inconsistent with dst and needs to reallocate 
space
     const datasketches::compact_theta_sketch union_result = 
union_sketch.get_result();
     ctx->Reallocate(dst->ptr, sizeof(datasketches::compact_theta_sketch));
     if (LIKELY(dst->ptr != NULL)) {
       memset(dst->ptr, 0, sizeof(datasketches::compact_theta_sketch));
       datasketches::compact_theta_sketch* sketch_ptr =
           reinterpret_cast<datasketches::compact_theta_sketch*>(dst->ptr);
       std::uninitialized_fill_n(sketch_ptr, 1, union_result);
     } else {
       DCHECK(!ctx->impl()->state()->GetQueryStatus().ok());
     }
   }
   ```
   2. use theta_union(need add new update functionality)
   ```c++
   void AggregateFunctions::DsThetaMerge(
       FunctionContext* ctx, const StringVal& src, StringVal* dst) {
     DCHECK(!src.is_null);
     DCHECK(!dst->is_null);
     DCHECK_EQ(dst->len, sizeof(datasketches::theta_union));
   
     // Note, 'src' is a serialized compact_theta_sketch and not a serialized 
theta_union.
     auto src_sketch =
         datasketches::compact_theta_sketch::deserialize((void*)src.ptr, 
src.len);
   
     datasketches::theta_union* dst_union_ptr =
         reinterpret_cast<datasketches::theta_union*>(dst->ptr);
   
     dst_union_ptr->update(src_sketch);
   }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to