hi Alex, Documentation contributions are treated equivalently to code contributions as far as process -- if you run into problems, let us know on the mailing list or JIRA/GitHub.
> While reviewing the docs I also noticed this page has a TODO - > https://arrow.apache.org/docs/python/data.html. Is that related to 1422 or > another ticket? No, these aren't related. ARROW-1422 is about documenting the serialization details used in the pyarrow.serialize function and related tools. See http://arrow.apache.org/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/ for more on this. Thanks Wes On Wed, Mar 21, 2018 at 4:24 PM, [email protected] <[email protected]> wrote: > Hi, > > I've come across a couple StackOverflow questions and JIRA tickets looking > for updates to the PyArrow documentation. I thought this might be a good way > for me to get more familiar with the code base while also contributing back. > I went throught JIRA trying to find all the Python documentation related > tickets and came up with the list below. I broke them into two groups; those > which I believe I can handle without additional context and a second group > that I know up front I'll need more information. I was also wondering if the > docs followed the same process outlined here: > https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md or if > there were any additional steps with Sphinx? > > Attempt to handle as is: > [Python] Update setup.py to use Markdown project description - > https://issues.apache.org/jira/browse/ARROW-2325 > [Python] Document read_pandas method in pyarrow.parquet - > https://issues.apache.org/jira/browse/ARROW-2014 > [Python] Add documentation about parquet.write_to_dataset and related methods > - https://issues.apache.org/jira/browse/ARROW-1858 > [Python] Add documentation examples for reading single Parquet files and > datasets from HDFS - https://issues.apache.org/jira/browse/ARROW-1848 > > Additional information or resources required: > [Python] Document on how to use Storefact & Arrow to read Parquet from > S3/Azure/... - https://issues.apache.org/jira/browse/ARROW-2077 > [Python] Add documentation / example for reading a directory of Parquet files > on S3 - https://issues.apache.org/jira/browse/ARROW-1682 > [Python] Add documentation section for integrations with PyTorch, TensorFlow > - https://issues.apache.org/jira/browse/ARROW-2075 > [Format] Add specification document for the serialization scheme used in > Python - https://issues.apache.org/jira/browse/ARROW-1422 > [Python] document differences w.r.t. fastparquet - > https://issues.apache.org/jira/browse/ARROW-760 > > While reviewing the docs I also noticed this page has a TODO - > https://arrow.apache.org/docs/python/data.html. Is that related to 1422 or > another ticket? > > Appreciate any suggestions, directions or information around moving forward > on this. > Alex
