alamb commented on code in PR #18946:
URL: https://github.com/apache/datafusion/pull/18946#discussion_r2589513456


##########
datafusion-examples/examples/builtin_functions/regexp.rs:
##########
@@ -32,12 +35,30 @@ use datafusion::prelude::*;
 /// https://docs.rs/regex/latest/regex/#grouping-and-flags
 pub async fn regexp() -> Result<()> {
     let ctx = SessionContext::new();
-    ctx.register_csv(
-        "examples",
-        "datafusion/physical-expr/tests/data/regex.csv",
-        CsvReadOptions::new(),
-    )
-    .await?;
+    // content from file 'datafusion/physical-expr/tests/data/regex.csv'
+    let csv_data = r#"values,patterns,replacement,flags
+abc,^(a),bb\1bb,i

Review Comment:
   why inline this content? It is fine, I am just curious



##########
ci/scripts/rust_example.sh:
##########
@@ -25,12 +25,26 @@ export CARGO_PROFILE_CI_STRIP=true
 cd datafusion-examples/examples/
 cargo build --profile ci --examples
 
-files=$(ls .)
-for filename in $files
-do
-  example_name=`basename $filename ".rs"`
-  # Skip tests that rely on external storage and flight
-  if [ ! -d $filename ]; then
-    cargo run --profile ci --example $example_name
-  fi
+SKIP_LIST=("external_dependency" "flight" "ffi")
+
+skip_example() {
+    local name="$1"
+    for skip in "${SKIP_LIST[@]}"; do
+        if [ "$name" = "$skip" ]; then
+            return 0
+        fi
+    done
+    return 1
+}
+
+for dir in */; do
+    example_name=$(basename "$dir")
+
+    if skip_example "$example_name"; then
+        echo "Skipping $example_name"
+        continue
+    fi
+
+    echo "Running example group: $example_name"

Review Comment:
   When I ran this script twice, I got an error the second time around:
   
   ```shell
   ./ci/scripts/rust_example.sh
   ./ci/scripts/rust_example.sh
   ```
   
   The second run made this:
   ```
   Running example: deserialize_to_struct
   Running example group: datafusion-examples
   error: no example target named `datafusion-examples` in default-run packages
   help: available example targets:
       builtin_functions
       custom_data_source
       data_io
       dataframe
       execution_monitoring
       external_dependency
       flight
       proto
       query_planning
       sql_ops
       udf
   ```



##########
datafusion-examples/examples/builtin_functions/main.rs:
##########
@@ -67,12 +71,38 @@ impl FromStr for ExampleKind {
 }
 
 impl ExampleKind {
-    const ALL: [Self; 3] = [Self::DateTime, Self::FunctionFactory, 
Self::Regexp];
+    const ALL_VARIANTS: [Self; 4] = [

Review Comment:
   When looking at the amount of boiler plate code, I think we can use strum to 
do the same thing https://crates.io/crates/strum
   
   I know in general adding a new dependency is something we try to avoid, but 
given strum is [already in the 
workspace](https://github.com/apache/datafusion/blob/f22a3f3955e667605c0ccbfd6e216f91f4f134ee/Cargo.lock#L6013-L6017),
 using it in  examples seems reasonable to me
   
   Specifically, 
   - https://docs.rs/strum_macros/latest/strum_macros/derive.EnumIter.html
   - https://docs.rs/strum_macros/latest/strum_macros/derive.EnumString.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to