Re: [PR] Store example data directly inside the datafusion-examples (#19141) [datafusion]

via GitHub Mon, 29 Dec 2025 05:00:12 -0800


cj-zhukov commented on code in PR #19319:
URL: https://github.com/apache/datafusion/pull/19319#discussion_r2650921636



##########
datafusion-examples/examples/data_io/parquet_exec_visitor.rs:
##########
@@ -29,23 +31,47 @@ use datafusion::physical_plan::metrics::MetricValue;
 use datafusion::physical_plan::{
     ExecutionPlan, ExecutionPlanVisitor, execute_stream, visit_execution_plan,
 };
+use datafusion::prelude::CsvReadOptions;
 use futures::StreamExt;
+use tempfile::TempDir;
+use tokio::fs::create_dir_all;
 
 /// Example of collecting metrics after execution by visiting the 
`ExecutionPlan`
 pub async fn parquet_exec_visitor() -> datafusion::common::Result<()> {
     let ctx = SessionContext::new();
 
-    let test_data = datafusion::test_util::parquet_test_data();
+    // Load CSV into an in-memory DataFrame, then materialize it to Parquet.

Review Comment:
   Thanks a lot for the feedback - this is a very good point!
   
   I agree that the CSV → Parquet setup code adds quite a bit of noise to the 
examples and can distract from the main idea being demonstrated, especially for 
first-time DataFusion users.
   
   I’ll refactor this by extracting the repeated CSV-to-Parquet logic into a 
small helper function (e.g. `write_csv_to_parquet`) so the examples stay 
self-contained but keep the core logic much clearer.
   
   And no worries at all about timing - I really appreciate you taking the time 
to review this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Store example data directly inside the datafusion-examples (#19141) [datafusion]

Reply via email to