andishgar commented on PR #46229: URL: https://github.com/apache/arrow/pull/46229#issuecomment-3038783169
@pitrou I’ve implemented an initial prototype, and I believe a few points are worth mentioning. First, here is [my initial solution](https://github.com/user-attachments/files/21074918/1493a4.zip). Please note that the code is currently quite messy ("spaghetti code") and lacks any testing, so it likely contains logical errors. My goal with this prototype was not correctness but rather to explore what might be required to implement this feature. 1. There is a function [here](https://github.com/apache/arrow/blob/66ee781dc2c7964950c7066d139fd592ac5700e5/cpp/src/arrow/compute/kernels/util_internal.h#L158-L175) for retrieving a bitmap buffer without copying, when possible. I suggest we open a separate pull request to move this function to [arrow/util/bitmap\_ops.h](https://github.com/apache/arrow/blob/66ee781dc2c7964950c7066d139fd592ac5700e5/cpp/src/arrow/util/bitmap_ops.h#L61), and then continue work on this pull request. What is your opinion? 2. To track which parts of a data buffer are used, I plan to use [boost::icl::interval_set](https://www.boost.org/doc/libs/1_88_0/libs/icl/doc/html/index.html#boost_icl.introduction.interval_combining_styles). While I’ve written an initial version (corresponding to the `Add` function in the uploaded code), I’m not entirely confident in its correctness. I want to first open an issue and a pull request to port the `boost::icl::interval_set` data structure to arrow/util, along with its implementation and tests.(Note that this is my first time using Boost, so it may take some time.) I think this data structure might be useful in other parts of the codebase as well—possibly in compute What do you think? Do you suggest a better algorithm or data structure for this use case? By the way, I’m aiming to use the boost::icl::interval_set data structure in a way that enables an API like this: ```c++ #include <iostream> #include <boost/icl/interval_set.hpp> void Run() { boost::icl::interval_set<int>::type set; boost::icl::interval<int>::type interval_1(10, 15); boost::icl::interval<int>::type interval_2(2, 3); boost::icl::interval<int>::type interval_3(1, 5); set.add(interval_2); set.add(interval_1); set.add(interval_3); set.add(interval_1); auto it =set.find(3); std::cout << it->lower() << std::endl; for (auto it : set) { std::cout << it << std::endl; } } int main() { Run(); return 0; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
