krcrouse commented on PR #13126:
URL: https://github.com/apache/arrow/pull/13126#issuecomment-1179888233

   @jorisvandenbossche, I'll wait for some more guidance from you all and just 
respond to your comments inline for the moment. 
   
   > * I think we will probably want to commit the generated code to the repo 
(and in any case, doing that for now might also make it easier to review 
(although you already linked the file above as well)).
   
   That makes sense and probably helps in the overall structure.
   
   > * Can you explain a bit more the need for `pyarrow.docutils`? 
   
   docutils is used to process the reStructured text appendices so that they 
can be merged with the autogenerated docs. In this model, I propose using reSt 
for the function documentation additions because it's established, it's 
testable, and contributors will at least understand what it is even if they're 
not proficient in it.
   
   If your question is more about "the need for it as a required module" -  If 
we include the generated files then the `docutils` requirement only needs to be 
a build dependency in order to generate the actual code files. It does not need 
to be included as a requirement for the actual package for end users.
   
   > (we should also make it a private module, or put in eg python/scripts or 
so?)
   
   Agreed - it should be a private module.
   
   >   As far as I understand, by using the full ability of docutils, we can 
override _any_ section of the docstring ...
   >   Do we think we will really need that full ability? Or it might in almost 
all cases be sufficient to just append content? 
   
   I think we would want both options. Since the default documentation is 
pulling from the C++ library code, I think you could browse the current 
[generated 
documentation](https://arrow.apache.org/docs/python/api/compute.html) and see 
sections that are not useful and could be entirely overwritten. I also think 
the hybrid approach of creating default documentation with options to append 
and/or overwrite is best because it will pull in changes to the core C++ 
function interface automatically while preserving the manually provided 
improved pythonic documentation.
   
   Take, for example, the parameter definitions of 
[`replace_with_mask`](https://arrow.apache.org/docs/python/generated/pyarrow.compute.replace_with_mask.html#pyarrow.compute.replace_with_mask)
 - in the context of the still not terribly verbose description, the parameter 
types are incorrect and the explanation of each parameter is useless ('Argument 
to compute function."). I don't have the code in front of me at the moment, but 
having implemented this function in pyarrow 7.0, there are several oddities in 
how these parameters are handled that should be in the param descriptions and 
not merely as an appended paragraph at the bottom.  
   
   >   Small nit: this doesn't need to be a public module, so maybe something 
like `_compute_generated.py` instead?
   > * > Using raw reSt so that code block examples can be tested using doctest
   
   Agreed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to