pitrou commented on code in PR #37868:
URL: https://github.com/apache/arrow/pull/37868#discussion_r1371320707


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -981,6 +981,48 @@ def test_make_fragment(multisourcefs):
         assert row_group_fragment.row_groups == [0]
 
 
[email protected]
[email protected]
+def test_make_fragment_with_size(s3_example_simple):

Review Comment:
   We don't need S3 for these tests, do we? Can we simply use the local 
filesystem, or even a mock filesystem?



##########
python/pyarrow/_dataset.pyx:
##########
@@ -1230,15 +1233,16 @@ cdef class FileFormat(_Weakrefable):
             The schema inferred from the file
         """
         cdef:
-            CFileSource c_source = _make_file_source(file, filesystem)
+            CFileSource c_source = _make_file_source(file, 
filesystem=filesystem)
             CResult[shared_ptr[CSchema]] c_result
         with nogil:
             c_result = self.format.Inspect(c_source)
         c_schema = GetResultValue(c_result)
         return pyarrow_wrap_schema(move(c_schema))
 
     def make_fragment(self, file, filesystem=None,
-                      Expression partition_expression=None):
+                      Expression partition_expression=None,
+                      size=None):

Review Comment:
   I'd make this a keyword-only argument.
   ```suggestion
                         *, size=None):
   ```



##########
cpp/src/arrow/filesystem/filesystem.h:
##########
@@ -60,6 +60,8 @@ struct ARROW_EXPORT FileInfo : public 
util::EqualityComparable<FileInfo> {
 
   explicit FileInfo(std::string path, FileType type = FileType::Unknown)
       : path_(std::move(path)), type_(type) {}
+  explicit FileInfo(std::string path, int64_t size, FileType type = 
FileType::Unknown)
+      : path_(std::move(path)), type_(type), size_(size) {}

Review Comment:
   Hmm, instead of adding constructor variants, I think it would be nice to 
start adding more descriptive factory functions, for example:
   ```suggestion
     static FileInfo File(std::string path, int64_t size) {
       f = FileInfo(std::move(path), FileType::File);
       f.set_size(size);
       return f;
     }
     static FileInfo Directory(std::string path) {
       return FileInfo(std::move(path), FileType::Directory);
     }
   ```



##########
cpp/src/arrow/dataset/file_base.h:
##########
@@ -54,7 +54,11 @@ class ARROW_DS_EXPORT FileSource : public 
util::EqualityComparable<FileSource> {
       : file_info_(std::move(path)),
         filesystem_(std::move(filesystem)),
         compression_(compression) {}
-
+  FileSource(std::string path, int64_t size, std::shared_ptr<fs::FileSystem> 
filesystem,

Review Comment:
   This new constructor doesn't seem useful, as there's another one taking a 
`FileInfo` below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to