andishgar commented on PR #46229:
URL: https://github.com/apache/arrow/pull/46229#issuecomment-3038783169

   
   @pitrou I’ve implemented an initial prototype, and I believe a few points 
are worth mentioning.
   
   First, here is [my initial 
solution](https://github.com/user-attachments/files/21074918/1493a4.zip).
   Please note that the code is currently quite messy ("spaghetti code") and 
lacks any testing, so it likely contains logical errors. My goal with this 
prototype was not correctness but rather to explore what might be required to 
implement this feature.
   
   1. There is a function 
[here](https://github.com/apache/arrow/blob/66ee781dc2c7964950c7066d139fd592ac5700e5/cpp/src/arrow/compute/kernels/util_internal.h#L158-L175)
 for retrieving a bitmap buffer without copying, when possible. I suggest we 
open a separate pull request to move this function to 
[arrow/util/bitmap\_ops.h](https://github.com/apache/arrow/blob/66ee781dc2c7964950c7066d139fd592ac5700e5/cpp/src/arrow/util/bitmap_ops.h#L61),
 and then continue work on this pull request. What is your opinion?
   2. To track which parts of a data buffer are used, I plan to use 
[boost::icl::interval_set](https://www.boost.org/doc/libs/1_88_0/libs/icl/doc/html/index.html#boost_icl.introduction.interval_combining_styles).
 While I’ve written an initial version (corresponding to the `Add` function in 
the uploaded code), I’m not entirely confident in its correctness.  I want to 
first open an issue and a pull request to port the `boost::icl::interval_set` 
data structure to arrow/util, along with its implementation and tests.(Note 
that this is my first time using Boost, so it may take some time.) I think this 
data structure might be useful in other parts of the codebase as well—possibly 
in compute
   What do you think? Do you suggest a better algorithm or data structure for 
this use case?
   
   By the way, I’m aiming to use the boost::icl::interval_set data structure in 
a way that enables an API like this:
   ```c++
   #include <iostream>
   #include <boost/icl/interval_set.hpp>
   
   void Run() {
       boost::icl::interval_set<int>::type set;
   
       boost::icl::interval<int>::type interval_1(10, 15);
       boost::icl::interval<int>::type interval_2(2, 3);
       boost::icl::interval<int>::type interval_3(1, 5);
       set.add(interval_2);
       set.add(interval_1);
       set.add(interval_3);
       set.add(interval_1);
   
       auto it =set.find(3);
       std::cout << it->lower() << std::endl;
   
       for (auto it : set) {
           std::cout << it << std::endl;
       }
   
   }
   
   int main() {
       Run();
       return 0;
   }
   
   ``` 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to