returnString opened a new pull request #9749: URL: https://github.com/apache/arrow/pull/9749
As discussed [here](https://github.com/apache/arrow/pull/9710#discussion_r596404956), we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc. This PR pulls in a new dev dependency, `doc-comment`, which allows for detecting all the `rust`-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for `README.md`. My only concerns are: - because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like - again on the verbosity front: we have lots of async code which requires a `#[tokio::main] async fn main() { ... }` wrapper Neither of these are inherently bad imo, but worth noting upfront. As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, #9710): ```rust use datafusion::prelude::*; use arrow::util::pretty::print_batches; use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { let mut ctx = ExecutionContext::new(); // create the dataframe let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; let df = df.filter(col("a").lt_eq(col("b")))? .aggregate(&[col("a")], &[min(col("b"))])? .limit(100)?; let results: Vec<RecordBatch> = df.collect().await?; print_batches(&results)?; Ok(()) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org