cj-zhukov commented on code in PR #19319:
URL: https://github.com/apache/datafusion/pull/19319#discussion_r2650921636
##########
datafusion-examples/examples/data_io/parquet_exec_visitor.rs:
##########
@@ -29,23 +31,47 @@ use datafusion::physical_plan::metrics::MetricValue;
use datafusion::physical_plan::{
ExecutionPlan, ExecutionPlanVisitor, execute_stream, visit_execution_plan,
};
+use datafusion::prelude::CsvReadOptions;
use futures::StreamExt;
+use tempfile::TempDir;
+use tokio::fs::create_dir_all;
/// Example of collecting metrics after execution by visiting the
`ExecutionPlan`
pub async fn parquet_exec_visitor() -> datafusion::common::Result<()> {
let ctx = SessionContext::new();
- let test_data = datafusion::test_util::parquet_test_data();
+ // Load CSV into an in-memory DataFrame, then materialize it to Parquet.
Review Comment:
Thanks a lot for the feedback - this is a very good point!
I agree that the CSV → Parquet setup code adds quite a bit of noise to the
examples and can distract from the main idea being demonstrated, especially for
first-time DataFusion users.
I’ll refactor this by extracting the repeated CSV-to-Parquet logic into a
small helper function (e.g. `write_csv_to_parquet`) so the examples stay
self-contained but keep the core logic much clearer.
And no worries at all about timing - I really appreciate you taking the time
to review this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]