Copilot commented on code in PR #49462:
URL: https://github.com/apache/arrow/pull/49462#discussion_r3195989358


##########
cpp/src/arrow/json/reader_test.cc:
##########
@@ -290,6 +290,39 @@ TEST(ReaderTest, MultipleChunksParallel) {
   AssertTablesEqual(*serial, *threaded);
 }
 
+// Regression test for intermittent threading crashes on MinGW.
+// Run this test multiple times manually to stress-test:
+//   while build/debug/arrow-json-test

Review Comment:
   The suggested shell snippet is split across two lines, which makes the 
`while` loop invalid in most shells (the `--gtest_filter=...` line will be 
parsed as a separate command). Put the command on one line or add a line 
continuation so the stress loop is copy/pasteable.
   



##########
cpp/src/arrow/json/reader_test.cc:
##########
@@ -290,6 +290,39 @@ TEST(ReaderTest, MultipleChunksParallel) {
   AssertTablesEqual(*serial, *threaded);
 }
 
+// Regression test for intermittent threading crashes on MinGW.
+// Run this test multiple times manually to stress-test:
+//   while build/debug/arrow-json-test
+//       --gtest_filter=ReaderTest.MultipleChunksParallelRegression; do :; done
+// See https://github.com/apache/arrow/issues/49272
+
+// Helper used by multiple-chunk parallel tests to build the JSON input,
+// configure read options and read the resulting table.
+static Result<std::shared_ptr<Table>> ReadMultipleChunksParallelTable(
+    int64_t count, bool use_threads, const ParseOptions& parse_options) {
+  ReadOptions read_options;
+  read_options.block_size = static_cast<int>(count / 2);
+  read_options.use_threads = use_threads;
+
+  std::string json;
+  json.reserve(static_cast<size_t>(count) * 16);  // rough reserve to avoid 
reallocations
+  for (int64_t i = 0; i < count; ++i) {
+    json += "{\"a\":" + std::to_string(i) + "}\n";
+  }
+
+  return ReadToTable(std::move(json), read_options, parse_options);
+}
+
+TEST(ReaderTest, MultipleChunksParallelRegression) {
+  int64_t count = 1 << 10;
+  ParseOptions parse_options;
+  parse_options.unexpected_field_behavior = UnexpectedFieldBehavior::InferType;
+
+  ASSERT_OK_AND_ASSIGN(auto table, ReadMultipleChunksParallelTable(
+                                       count, /*use_threads=*/true, 
parse_options));
+  ASSERT_EQ(table->num_rows(), count);
+}

Review Comment:
   PR description mentions adding a stress test that runs the parallel path 
repeatedly (e.g., 20 iterations), but this test currently runs the workload 
only once and is named `MultipleChunksParallelRegression`. Either update the 
description or make the test actually repeat/align the naming so CI gets the 
intended stress coverage.



##########
cpp/src/arrow/json/reader_test.cc:
##########
@@ -290,6 +290,39 @@ TEST(ReaderTest, MultipleChunksParallel) {
   AssertTablesEqual(*serial, *threaded);
 }
 
+// Regression test for intermittent threading crashes on MinGW.
+// Run this test multiple times manually to stress-test:
+//   while build/debug/arrow-json-test
+//       --gtest_filter=ReaderTest.MultipleChunksParallelRegression; do :; done
+// See https://github.com/apache/arrow/issues/49272
+
+// Helper used by multiple-chunk parallel tests to build the JSON input,
+// configure read options and read the resulting table.
+static Result<std::shared_ptr<Table>> ReadMultipleChunksParallelTable(
+    int64_t count, bool use_threads, const ParseOptions& parse_options) {
+  ReadOptions read_options;
+  read_options.block_size = static_cast<int>(count / 2);
+  read_options.use_threads = use_threads;
+
+  std::string json;
+  json.reserve(static_cast<size_t>(count) * 16);  // rough reserve to avoid 
reallocations
+  for (int64_t i = 0; i < count; ++i) {
+    json += "{\"a\":" + std::to_string(i) + "}\n";
+  }
+
+  return ReadToTable(std::move(json), read_options, parse_options);
+}

Review Comment:
   This helper duplicates JSON construction / ReadOptions setup already present 
in `ReaderTest.MultipleChunksParallel`, but that existing test isn’t using the 
helper. Consider refactoring `MultipleChunksParallel` to call this helper (or 
adjust the comment) to avoid two separate implementations drifting over time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to