[GitHub] [arrow] save-buffer commented on a diff in pull request #14227: ARROW-17837: [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan's shared data structures

GitBox Tue, 22 Nov 2022 15:27:24 -0800


save-buffer commented on code in PR #14227:
URL: https://github.com/apache/arrow/pull/14227#discussion_r1029908916



##########
cpp/src/arrow/compute/exec/query_context.h:
##########
@@ -0,0 +1,165 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/exec.h"
+#include "arrow/compute/exec/task_util.h"
+#include "arrow/compute/exec/util.h"
+#include "arrow/io/interfaces.h"
+#include "arrow/util/async_util.h"
+
+#pragma once
+
+namespace arrow {
+namespace internal {
+class CpuInfo;
+}
+
+using io::IOContext;
+namespace compute {
+struct ARROW_EXPORT QueryOptions {
+  QueryOptions();
+  // 0 means unlimited
+  size_t max_memory_bytes;
+
+  /// \brief Should the plan use a legacy batching strategy
+  ///
+  /// This is currently in place only to support the Scanner::ToTable
+  /// method.  This method relies on batch indices from the scanner
+  /// remaining consistent.  This is impractical in the ExecPlan which
+  /// might slice batches as needed (e.g. for a join)
+  ///
+  /// However, it still works for simple plans and this is the only way
+  /// we have at the moment for maintaining implicit order.
+  bool use_legacy_batching;
+};
+
+class ARROW_EXPORT QueryContext {
+ public:
+  QueryContext(QueryOptions opts = {},
+               ExecContext exec_context = *default_exec_context());
+
+  Status Init(size_t max_num_threads);
+
+  const ::arrow::internal::CpuInfo* cpu_info() const;
+  int64_t hardware_flags() const { return cpu_info()->hardware_flags(); }
+  const QueryOptions& options() const { return options_; }
+  MemoryPool* memory_pool() const { return exec_context_.memory_pool(); }
+  ::arrow::internal::Executor* executor() const { return 
exec_context_.executor(); }
+  ExecContext* exec_context() { return &exec_context_; }
+  IOContext* io_context() { return &io_context_; }
+  TaskScheduler* scheduler() { return task_scheduler_.get(); }
+  util::AsyncTaskScheduler* async_scheduler() { return async_scheduler_.get(); 
}
+
+  size_t GetThreadIndex();
+  size_t max_concurrency() const;
+  Result<util::TempVectorStack*> GetTempStack(size_t thread_index);

Review Comment:
   Yep `GetThreadIndex` involves grabbing a mutex and stuff, really not ideal. 
Tasks will usually have their thread_index floating around anyway, so this 
should be a rare thing. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] save-buffer commented on a diff in pull request #14227: ARROW-17837: [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan's shared data structures

Reply via email to