hi Alex,

Documentation contributions are treated equivalently to code
contributions as far as process -- if you run into problems, let us
know on the mailing list or JIRA/GitHub.

> While reviewing the docs I also noticed this page has a TODO - 
> https://arrow.apache.org/docs/python/data.html. Is that related to 1422 or 
> another ticket?

No, these aren't related. ARROW-1422 is about documenting the
serialization details used in the pyarrow.serialize function and
related tools. See
http://arrow.apache.org/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/
for more on this.

Thanks
Wes

On Wed, Mar 21, 2018 at 4:24 PM, [email protected]
<[email protected]> wrote:
> Hi,
>
> I've come across a couple StackOverflow questions and JIRA tickets looking 
> for updates to the PyArrow documentation. I thought this might be a good way 
> for me to get more familiar with the code base while also contributing back. 
> I went throught JIRA trying to find all the Python documentation related 
> tickets and came up with the list below. I broke them into two groups; those 
> which I believe I can handle without additional context and a second group 
> that I know up front I'll need more information. I was also wondering if the 
> docs followed the same process outlined here: 
> https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md or if 
> there were any additional steps with Sphinx?
>
> Attempt to handle as is:
> [Python] Update setup.py to use Markdown project description - 
> https://issues.apache.org/jira/browse/ARROW-2325
> [Python] Document read_pandas method in pyarrow.parquet - 
> https://issues.apache.org/jira/browse/ARROW-2014
> [Python] Add documentation about parquet.write_to_dataset and related methods 
> - https://issues.apache.org/jira/browse/ARROW-1858
> [Python] Add documentation examples for reading single Parquet files and 
> datasets from HDFS - https://issues.apache.org/jira/browse/ARROW-1848
>
> Additional information or resources required:
> [Python] Document on how to use Storefact & Arrow to read Parquet from 
> S3/Azure/... - https://issues.apache.org/jira/browse/ARROW-2077
> [Python] Add documentation / example for reading a directory of Parquet files 
> on S3 - https://issues.apache.org/jira/browse/ARROW-1682
> [Python] Add documentation section for integrations with PyTorch, TensorFlow 
> - https://issues.apache.org/jira/browse/ARROW-2075
> [Format] Add specification document for the serialization scheme used in 
> Python - https://issues.apache.org/jira/browse/ARROW-1422
> [Python] document differences w.r.t. fastparquet - 
> https://issues.apache.org/jira/browse/ARROW-760
>
> While reviewing the docs I also noticed this page has a TODO - 
> https://arrow.apache.org/docs/python/data.html. Is that related to 1422 or 
> another ticket?
>
> Appreciate any suggestions, directions or information around moving forward 
> on this.
> Alex

Reply via email to