[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11245: ARROW-13637: [Python] Fix docstrings

GitBox Tue, 28 Sep 2021 04:45:18 -0700


jorisvandenbossche commented on a change in pull request #11245:
URL: https://github.com/apache/arrow/pull/11245#discussion_r717435829




##########
File path: python/pyarrow/_fs.pyx
##########
@@ -934,36 +934,42 @@ class FileSystemHandler(ABC):
         """
         Implement PyFileSystem.type_name.
         """
+    get_type_name.__doc__ = FileSystem.get_type_name.__doc__
 
     @abstractmethod
     def get_file_info(self, paths):
         """
         Implement PyFileSystem.get_file_info(paths).
         """
+    get_file_info.__doc__ = FileSystem.get_file_info.__doc__
 
     @abstractmethod
     def get_file_info_selector(self, selector):
         """
         Implement PyFileSystem.get_file_info(selector).
         """
+    get_file_info_selector.__doc__ = FileSystem.get_file_info_selector.__doc__

Review comment:
       `FileSystem.get_file_info_selector` doesn't exist. In general, there is 
not necessarily a 1:1 mapping between FileSystemHandler and FileSystem methods, 
so I would maybe leave those docstrings as is. 
   I think the "Implement ..." is actually also useful to keep (while now the 
docstring gets overwritten).

##########
File path: python/pyarrow/_fs.pyx
##########
@@ -934,36 +934,42 @@ class FileSystemHandler(ABC):
         """
         Implement PyFileSystem.type_name.
         """
+    get_type_name.__doc__ = FileSystem.get_type_name.__doc__
 
     @abstractmethod
     def get_file_info(self, paths):
         """
         Implement PyFileSystem.get_file_info(paths).
         """
+    get_file_info.__doc__ = FileSystem.get_file_info.__doc__
 
     @abstractmethod
     def get_file_info_selector(self, selector):
         """
         Implement PyFileSystem.get_file_info(selector).
         """
+    get_file_info_selector.__doc__ = FileSystem.get_file_info_selector.__doc__
 
     @abstractmethod
     def create_dir(self, path, recursive):
         """
         Implement PyFileSystem.create_dir(...).
         """
+    create_dir.__doc__ = FileSystem.create_dir.__doc__
 
     @abstractmethod
     def delete_dir(self, path):
         """
         Implement PyFileSystem.delete_dir(...).
         """
+    delete_dir.__doc__ = FileSystem.delete_dir.__doc__
 
     @abstractmethod
     def delete_dir_contents(self, path):
         """
         Implement PyFileSystem.delete_dir_contents(...).
         """
+    delete_dir_contents.__doc__ = FileSystem.delete_dir_contents.__doc__

Review comment:
       Another example here of what I commented above: the docstring of 
`FileSystem.delete_dir_contents` won't actually be fully correct, since it has 
more keywords than `FileSystemHandler.delete_dir_contents`. 

##########
File path: python/pyarrow/tensor.pxi
##########
@@ -38,6 +38,16 @@ strides: {0.strides}""".format(self)
 
     @staticmethod
     def from_numpy(obj, dim_names=None):
+        """
+        Create a Tensor from a numpy array.
+
+        Parameters
+        ----------
+        obj : numpy.ndarray
+            The source numpy array
+        dim_names : list

Review comment:
       ```suggestion
           dim_names : list, optional
   ```

##########
File path: python/pyarrow/io.pxi
##########
@@ -1307,6 +1307,18 @@ ctypedef CRandomAccessFile* _RandomAccessFilePtr
 
 
 cdef class BufferedInputStream(NativeFile):
+    """
+    Wraps an input stream making it buffered.
+

Review comment:
       We could maybe add some extra explanation of what "buffered" means / why 
it's used. From the C++ doc comment:
   
   > An InputStream that performs buffered reads from an unbuffered 
InputStream, which can mitigate the overhead of many small reads in some cases

##########
File path: python/pyarrow/io.pxi
##########
@@ -1307,6 +1307,18 @@ ctypedef CRandomAccessFile* _RandomAccessFilePtr
 
 
 cdef class BufferedInputStream(NativeFile):
+    """
+    Wraps an input stream making it buffered.
+
+    Parameters
+    ----------
+    stream : NativeFile
+        The stream to wrap with the buffer
+    buffer_size : int
+        Size of the buffer that should be added.

Review comment:
       "added" sounds a bit strange in this context? (the C++ version says "the 
size of the temporary read buffer")

##########
File path: python/pyarrow/compute.py
##########
@@ -132,11 +132,15 @@ def _decorate_compute_function(wrapper, exposed_name, 
func, option_class):
     if option_class is not None:
         doc_pieces.append("""\
             options : pyarrow.compute.{0}, optional
-                Parameters altering compute function semantics
-            **kwargs : optional
-                Parameters for {0} constructor. Either `options`
-                or `**kwargs` can be passed, but not both at the same time.
+                Parameters altering compute function semantics.
             """.format(option_class.__name__))
+        options_sig = inspect.signature(option_class)
+        for p in options_sig.parameters.values():
+            doc_pieces.append("""\
+            {0} : optional
+                Parameter for {1} constructor. Either `options`
+                or `{0}` can be passed, but not both at the same time.
+            """.format(p.name, option_class.__name__))

Review comment:
       Nice! (now we should still have a way to automatically add actual 
explanation of the keyword, but that's for another JIRA :))

##########
File path: python/pyarrow/parquet.py
##########
@@ -214,6 +214,8 @@ class ParquetFile:
         Coalesce and issue file reads in parallel to improve performance on
         high-latency filesystems (e.g. S3). If True, Arrow will use a
         background I/O thread pool.
+    read_dictionary : list
+        List of names to read directly as DictionaryArray

Review comment:
       ```suggestion
           List of names to read directly as DictionaryArray.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11245: ARROW-13637: [Python] Fix docstrings

Reply via email to