kumarUjjawal commented on code in PR #19729:
URL: https://github.com/apache/datafusion/pull/19729#discussion_r2848741485


##########
datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs:
##########
@@ -176,45 +176,100 @@ pub async fn from_read_rel(
             }))
         }
         Some(ReadType::LocalFiles(lf)) => {
-            fn extract_filename(name: &str) -> Option<String> {
-                let corrected_url =
-                    if name.starts_with("file://") && 
!name.starts_with("file:///") {
-                        name.replacen("file://", "file:///", 1)
-                    } else {
-                        name.to_string()
-                    };
-
-                Url::parse(&corrected_url).ok().and_then(|url| {
-                    let path = url.path();
-                    std::path::Path::new(path)
-                        .file_name()
-                        .map(|filename| filename.to_string_lossy().to_string())
-                })
+            /// Extracts the path string from a PathType variant.
+            /// Normalizes file:// URLs to file:/// format for proper URL 
parsing.
+            fn extract_path(path_type: Option<&PathType>) -> Option<String> {
+                let path_str = match path_type? {
+                    PathType::UriPath(p) => p.clone(),
+                    PathType::UriPathGlob(p) => p.clone(),
+                    PathType::UriFile(p) => p.clone(),
+                    PathType::UriFolder(p) => p.clone(),
+                };
+
+                // Normalize file:// to file:/// for proper URL parsing
+                let normalized = if path_str.starts_with("file://")
+                    && !path_str.starts_with("file:///")
+                {
+                    path_str.replacen("file://", "file:///", 1)
+                } else {
+                    path_str
+                };

Review Comment:
   Thanks for the feedback. I had few thoughts: 
   The Substrait spec says these are "URIs" where "the protocol portion of the 
URI determines the storage type", but the existing physical plan 
producer/consumer in DF treats them as raw strings, the producer emits bare 
object-store paths like "file1.parquet" (not valid URIs), and the consumer 
passes them straight to ObjectMeta without parsing. If we require Url::parse() 
to succeed, plans produced by DataFusion's own physical plan producer would be 
rejected by our logical plan consumer.
   
   So I was thinking we keep &[String] and  pass raw URI strings and let the 
consumer decide how to parse. consistent with the physical plan side. What do 
you think?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to