raulcd commented on code in PR #48008:
URL: https://github.com/apache/arrow/pull/48008#discussion_r2480656767


##########
python/pyarrow/parquet/core.py:
##########
@@ -1887,10 +1887,23 @@ def read_table(source, *, columns=None, 
use_threads=True,
                 "the 'schema' argument is not supported when the "
                 "pyarrow.dataset module is not available"
             )
+        if isinstance(source, list):
+            raise ValueError(
+                "the 'source' argument cannot be a list of files "
+                "when the pyarrow.dataset is not available"
+            )
+
         filesystem, path = _resolve_filesystem_and_path(source, filesystem)
         if filesystem is not None:
-            source = filesystem.open_input_file(path)
-        # TODO test that source is not a directory or a list
+            try:
+                source = filesystem.open_input_file(path)
+            except (OSError, FileNotFoundError) as e:

Review Comment:
   Will `filesystem.open_input_file` raise `OSError` and/or `FileNotFoundError` 
only when source is not a file and is a dir? Could for example raise `OSError` 
when the process doesn't have permission to read the file or others? 
   If one of those exceptions can raise on other cases, we might be misleading 
users as we will always raise a `ValueError` with the description pointing to 
the dir vs file issue.
   We might want to check with something like (I haven't tested just checked at 
the filesystem API):
   ```python
   filesystem.get_file_info(path).is_file()
   ```
   
   



##########
python/pyarrow/parquet/core.py:
##########
@@ -1887,10 +1887,23 @@ def read_table(source, *, columns=None, 
use_threads=True,
                 "the 'schema' argument is not supported when the "
                 "pyarrow.dataset module is not available"
             )
+        if isinstance(source, list):
+            raise ValueError(
+                "the 'source' argument cannot be a list of files "
+                "when the pyarrow.dataset is not available"

Review Comment:
   ```suggestion
                   "when the pyarrow.dataset module is not available"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to