Re: [PR] GH-41560: [C++] ChunkResolver: Implement ResolveMany and add unit tests [arrow]

via GitHub Tue, 14 May 2024 08:35:50 -0700


pitrou commented on code in PR #41561:
URL: https://github.com/apache/arrow/pull/41561#discussion_r1600255530



##########
cpp/src/arrow/chunk_resolver.cc:
##########
@@ -54,6 +54,52 @@ inline std::vector<int64_t> MakeChunksOffsets(const 
std::vector<T>& chunks) {
   offsets[chunks.size()] = offset;
   return offsets;
 }
+
+/// \pre all the pre-conditions of ChunkResolver::ResolveMany()
+/// \pre num_offsets - 1 <= std::numeric_limits<IndexType>::max()
+template <typename IndexType>
+void ResolveManyInline(size_t num_offsets, const int64_t* ARROW_RESTRICT 
offsets,
+                       int64_t n, const IndexType* ARROW_RESTRICT 
logical_index_vec,
+                       IndexType* ARROW_RESTRICT out_chunk_index_vec,
+                       IndexType chunk_hint,
+                       IndexType* ARROW_RESTRICT out_index_in_chunk_vec) {
+  const auto num_chunks = static_cast<IndexType>(num_offsets - 1);
+  // chunk_hint in [0, num_offsets) per the precondition.
+  for (int64_t i = 0; i < n; i++) {
+    const auto index = static_cast<uint64_t>(logical_index_vec[i]);
+    if (index >= static_cast<uint64_t>(offsets[chunk_hint]) &&
+        (chunk_hint == num_chunks ||

Review Comment:
   Well, this is making many assumptions about the kind of algorithm that we'll 
choose to implement, for example the idea that we'd pass NULL indices to 
`ResolveMany` instead of skipping them. I see two issues with that:
   1. bisecting has a cost and perhaps you don't want to pay that cost over the 
NULL entries of the indices array, only to ignore the results afterwards.
   2. updating the chunk hint based on the lookup of NULL indices leads to a 
lower quality chunk hint.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41560: [C++] ChunkResolver: Implement ResolveMany and add unit tests [arrow]

Reply via email to