westonpace commented on code in PR #15288:
URL: https://github.com/apache/arrow/pull/15288#discussion_r1071244661
##########
cpp/src/arrow/compute/exec/plan_test.cc:
##########
@@ -322,26 +306,16 @@ void TestSourceSink(
std::string source_factory_name,
std::function<Result<std::vector<ElementType>>(const BatchesWithSchema&)>
to_elements) {
- ASSERT_OK_AND_ASSIGN(auto executor, arrow::internal::ThreadPool::Make(1));
- ExecContext exec_context(default_memory_pool(), executor.get());
- ASSERT_OK_AND_ASSIGN(auto plan, ExecPlan::Make(exec_context));
- AsyncGenerator<std::optional<ExecBatch>> sink_gen;
-
auto exp_batches = MakeBasicBatches();
ASSERT_OK_AND_ASSIGN(auto elements, to_elements(exp_batches));
auto element_it_maker = [&elements]() {
return MakeVectorIterator<ElementType>(elements);
};
-
- ASSERT_OK(Declaration::Sequence({
- {source_factory_name,
- OptionsType{exp_batches.schema,
element_it_maker}},
- {"sink", SinkNodeOptions{&sink_gen}},
- })
- .AddToPlan(plan.get()));
-
- ASSERT_THAT(StartAndCollect(plan.get(), sink_gen),
-
Finishes(ResultWith(UnorderedElementsAreArray(exp_batches.batches))));
+ Declaration plan(source_factory_name,
+ OptionsType{exp_batches.schema, element_it_maker});
+ ASSERT_OK_AND_ASSIGN(auto result,
+ DeclarationToExecBatches(std::move(plan),
/*use_threads=*/false));
Review Comment:
Ah, I have a fix for that test here:
https://github.com/apache/arrow/pull/15253/files#diff-b4e59a9cedddaaa33c03f220c03ac812f9ebdfb06692c9f91c5f8f5988363b57R853
However, I don't know if that PR will get finished in time for the release.
I'll open an independent PR with the fix. The test does this:
* Generate a bunch of random batches
* Start the plan
* Wait for one batch to arrive
* Cancel/abort the plan
* Verify the one batch that arrived was the first of the random batches
However, there is no guarantee, when run in parallel, that the first batch
to complete will be the first batch generated.
Since the test doesn't fully consume the plan we can't use DeclarationToXyz
here. This means when run serially it is using "single-thread mode" which is
still slightly parallel enough to cause random ordering.
In my fix I changed it so parallel=false would run synchronously (and have
dependent order) and parallel=true would not require it to be the first batch
generated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]