[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #4743: Simplify parquet filter predicate test, fix Page Filtering Incorrectly Handles Pages with Different Row Counts

GitBox Tue, 27 Dec 2022 06:36:36 -0800


waynexia commented on code in PR #4743:
URL: https://github.com/apache/arrow-datafusion/pull/4743#discussion_r1057716291



##########
datafusion/core/tests/parquet/filter_pushdown.rs:
##########
@@ -42,60 +42,64 @@ use tempfile::TempDir;
 use test_utils::AccessLogGenerator;
 
 /// how many rows of generated data to write to our parquet file (arbitrary)
-const NUM_ROWS: usize = 53819;
-const ROW_LIMIT: usize = 4096;
-
-#[cfg(test)]
-#[ctor::ctor]
-fn init() {
-    // enable logging so RUST_LOG works
-    let _ = env_logger::try_init();
-}
-
-#[cfg(not(target_family = "windows"))]
-// Use multi-threaded executor as this test consumes CPU
-#[tokio::test(flavor = "multi_thread")]
-async fn single_file() {
-    // Only create the parquet file once as it is fairly large
-
-    let tempdir = TempDir::new().unwrap();
+const NUM_ROWS: usize = 4096;
 
+fn generate_file(tempdir: &TempDir, props: WriterProperties) -> 
TestParquetFile {
+    // Tune down the generator for smaller files
     let generator = AccessLogGenerator::new()
         .with_row_limit(NUM_ROWS)
-        .with_max_batch_size(ROW_LIMIT);
+        .with_pods_per_host(1..4)
+        .with_containers_per_pod(1..2)
+        .with_entries_per_container(128..256);
 
-    // default properties
-    let props = WriterProperties::builder().build();
     let file = tempdir.path().join("data.parquet");
 
     let start = Instant::now();
     println!("Writing test data to {:?}", file);
-    let test_parquet_file =
-        Arc::new(TestParquetFile::try_new(file, props, generator).unwrap());
+    let test_parquet_file = TestParquetFile::try_new(file, props, 
generator).unwrap();
     println!(
         "Completed generating test data in {:?}",
         Instant::now() - start
     );
+    test_parquet_file
+}
+
+#[cfg(test)]
+#[ctor::ctor]
+fn init() {
+    // enable logging so RUST_LOG works
+    let _ = env_logger::try_init();
+}
+
+#[cfg(not(target_family = "windows"))]
+#[tokio::test]
+async fn single_file() {
+    // Only create the parquet file once as it is fairly large
 
-    let mut set = tokio::task::JoinSet::new();

Review Comment:
   Looks good to me. It also takes me some time to figure out which sub-case 
fails. If the time is greatly shortened, I agree we need not pay for the 
parallel



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #4743: Simplify parquet filter predicate test, fix Page Filtering Incorrectly Handles Pages with Different Row Counts

Reply via email to