martin-g commented on code in PR #19890:
URL: https://github.com/apache/datafusion/pull/19890#discussion_r2708346961
##########
datafusion/core/tests/parquet/page_pruning.rs:
##########
@@ -961,3 +964,56 @@ fn cast_count_metric(metric: MetricValue) -> Option<usize>
{
_ => None,
}
}
+
+#[tokio::test]
+async fn test_parquet_opener_without_page_index() {
+ // Defines a simple schema and batch
+ let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int32,
true)]));
+ let batch = RecordBatch::try_new(
+ schema.clone(),
+ vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
+ )
+ .unwrap();
+
+ // Create a temp file
+ let file = tempfile::Builder::new()
+ .suffix(".parquet")
+ .tempfile()
+ .unwrap();
+ let path = file.path().to_str().unwrap().to_string();
+
+ // Write parquet WITHOUT page index
+ let props = WriterProperties::builder().build();
+
+ let file_fs = std::fs::File::create(&path).unwrap();
Review Comment:
This tries to reopen the already opened `file` (line 979). Maybe you can
just reuse it ?!
##########
datafusion/core/tests/parquet/page_pruning.rs:
##########
@@ -961,3 +964,56 @@ fn cast_count_metric(metric: MetricValue) -> Option<usize>
{
_ => None,
}
}
+
+#[tokio::test]
+async fn test_parquet_opener_without_page_index() {
+ // Defines a simple schema and batch
+ let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int32,
true)]));
+ let batch = RecordBatch::try_new(
+ schema.clone(),
+ vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
+ )
+ .unwrap();
+
+ // Create a temp file
+ let file = tempfile::Builder::new()
+ .suffix(".parquet")
+ .tempfile()
+ .unwrap();
+ let path = file.path().to_str().unwrap().to_string();
+
+ // Write parquet WITHOUT page index
+ let props = WriterProperties::builder().build();
Review Comment:
Which property exactly controls whether the page index is enabled or not ?
Maybe you can disable it explicitly here with
`WriterProperties::builder().with_xyz(false).build();`, so it still work as
desired if it ever becomes enabled by default.
##########
parquet-testing:
##########
Review Comment:
Is the update of the parquet-testing submodule needed for the new test ?
##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -436,7 +436,7 @@ impl FileOpener for ParquetOpener {
reader_metadata,
&mut async_file_reader,
// Since we're manually loading the page index the option
here should not matter but we pass it in for consistency
Review Comment:
This comment probably needs to be updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]