jorisvandenbossche commented on a change in pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#discussion_r557464597



##########
File path: cpp/src/arrow/filesystem/hdfs.cc
##########
@@ -69,6 +69,14 @@ class HadoopFileSystem::Impl {
   HdfsOptions options() const { return options_; }
 
   Result<FileInfo> GetFileInfo(const std::string& path) {
+    // It has unfortunately been a frequent logic error to pass URIs down
+    // to GetFileInfo (e.g. ARROW-10264).  Unlike other filesystems, HDFS
+    // silently accepts URIs but returns different results than if given the
+    // equivalent in-filesystem paths.  Instead of raising cryptic errors
+    // later, notify the underlying problem immediately.
+    if (path.substr(0, 5) == "hdfs:") {

Review comment:
       or "viewfs" ? 
   (I am not familiar with it, I only know that in the python/cython code there 
are some places that checks for this as well ..)

##########
File path: python/pyarrow/parquet.py
##########
@@ -1493,15 +1493,16 @@ def __init__(self, path_or_paths, filesystem=None, 
filters=None,
                 single_file = path_or_paths[0]
         else:
             if _is_path_like(path_or_paths):
-                path = str(path_or_paths)
+                path_or_paths = str(path_or_paths)
                 if filesystem is None:
                     # path might be a URI describing the FileSystem as well
                     try:
-                        filesystem, path = FileSystem.from_uri(path)
+                        filesystem, path_or_paths = FileSystem.from_uri(
+                            path_or_paths)

Review comment:
       Ah, good catch. So we were passing below still the original 
`path_or_paths` URI to the dataset constructor (instead of the non-URI path 
returned by from_uri), but also passing the filesystem inferred from the URI 
here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to