Hi all,

Please see the NEP below for a proposal to restructure the documentation of
NumPy. The main goal here is to make the documentation more visible and
organized, and also make contributions easier.

Comments and feedback are welcome!

See https://github.com/numpy/numpy/pull/15554 for details.




NEP 44 — Restructuring the NumPy Documentation

Authors: Ralf Gommers, Melissa Mendonça, Mars Lee
Status: Draft
Type: Process
Created: 2020-02-11


This document proposes a restructuring of the NumPy Documentation, both in
form and content, with the goal of making it more organized and
discoverable for beginners and experienced users.

Motivation and Scope

See [here](numpy.org/devdocs) for the front page of the latest docs. The
organization is quite confusing and illogical (e.g. user and developer docs
are mixed). We propose the following:

- Reorganizing the docs into the four categories mentioned in [1];
- Creating dedicated sections for Tutorials and How-Tos, including
orientation on how to create new content;
- Adding an Explanations section for key concepts and techniques that
require deeper descriptions, some of which will be rearranged from the
Reference Guide.

Usage and Impact

The documentation is a fundamental part of any software project, especially
open source projects. In the case of NumPy, many beginners might feel
demotivated by the current structure of the documentation, since it is
difficult to discover what to learn (unless the user has a clear view of
what to look for in the Reference docs, which is not always the case).

Looking at the results of a “NumPy Tutorial” search on any search engine
also gives an idea of the demand for this kind of content. Having official
high-level documentation written using up-to-date content and techniques
will certainly mean more users (and developers/contributors) are involved
in the NumPy community.

Backward compatibility

The restructuring will effectively demand a complete rewrite of links and
some of the current content. Input from the community will be useful for
identifying key links and pages that should not be broken.

Detailed description

As discussed in the article [1], there are four categories of doc content:
- Tutorials
- How-to guides
- Explanations
- Reference guide

We propose to use those categories as the ones we use (for writing and
reviewing) whenever we add a new documentation section.

The reasoning for this is that it is clearer both for
developers/documentation writers and to users where each information should
go, and the scope and tone of each document. For example, if explanations
are mixed with basic tutorials, beginners might be overwhelmed and
alienated. On the other hand, if the reference guide contains basic
how-tos, it might be difficult for experienced users to find the
information they need, quickly.

Currently, there are many blogs and tutorials on the internet about NumPy
or using NumPy. One of the issues with this is that if users search for
this information and end up in an outdated (unofficial) tutorial before
they find the current official documentation, they end up creating content
that is confusing, especially for beginners. Having a better infrastructure
for the documentation also aims to solve this problem by giving users
high-level, up-to-date official documentation that can be easily updated.

Status and ideas of each type of doc content

* Reference guide

NumPy has a quite complete reference guide. All functions are documented,
most have examples, and most are cross-linked well with See Also sections.
Further improving the reference guide is incremental work that can be done
(and is being done) by many people. There are, however, many explanations
in the reference guide. These can be moved to a more dedicated Explanations
section on the docs.

* How-to guides

NumPy does not have many how-to’s. The subclassing and array ducktyping
section may be an example of a how-to. Others that could be added are:
- Parallelization (controlling BLAS multithreading with threadpoolctl,
using multiprocessing, random number generation, etc.)
- Storing and loading data (.npy/.npz format, text formats, Zarr, HDF5,
Bloscpack, etc.)
- Performance (memory layout, profiling, use with Numba, Cython, or Pythran)
- Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse,

* Explanations

There is a reasonable amount of content on fundamental NumPy concepts such
as indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could
be organized better and clarified to ensure it’s really about explaining
the concepts and not mixed with tutorial or how-to like content.

There are few explanations about anything other than those fundamental
NumPy concepts.

Some examples of concepts that could be expanded:
- Copies vs. Views;
- BLAS and other linear algebra libraries;
- Fancy indexing.

In addition, there are many explanations in the Reference Guide, which
should be moved to this new dedicated Explanations section.

* Tutorials

There’s a lot of scope for writing better tutorials. We have a new NumPy
for absolute beginners tutorial [3] (GSoD project of Anne Bonner). In
addition we need a number of tutorials addressing different levels of
experience with Python and NumPy. This could be done using engaging data
sets, ideas or stories. For example, curve fitting with polynomials and
functions in numpy.linalg could be done with the Keeling curve (decades
worth of CO2 concentration in air measurements) rather than with synthetic
random data.

Ideas for tutorials (these capture the types of things that make sense,
they’re not necessarily the exact topics we propose to implement):
- Conway’s game of life with only NumPy (note: already in Nicolas Rougier’s
- Using masked arrays to deal with missing data in time series measurements
- Using Fourier transforms to analyze the Keeling curve data, and
extrapolate it.
- Geospatial data (e.g. lat/lon/time to create maps for every year via a
stacked array, like gridMet data)
- Using text data and dtypes (e.g. use speeches from different people,
shape (n_speech, n_sentences, n_words))

The Preparing to Teach document [2] from the Software Carpentry Instructor
Training materials is a nice summary of how to write effective lesson plans
(and tutorials would be very similar). In addition to adding new tutorials,
we also propose a How to write a tutorial document, which would help users
contribute new high-quality content to the documentation.

Data sets

Using interesting data in the NumPy docs requires giving all users access
to that data, either inside NumPy or in a separate package. The former is
not the best idea, since it’s hard to do without increasing the size of
NumPy significantly. Even for SciPy there has so far been no consensus on
this (see scipy PR 8707 on adding a new scipy.datasets subpackage).

So we’ll aim for a new (pure Python) package, named numpy-datasets or
scipy-datasets or something similar. That package can take some lessons
from how, e.g., scikit-learn ships data sets. Small data sets can be
included in the repo, large data sets can be accessed via a downloader
class or function.

Related Work

Some examples of documentation organization in other projects:
- Documentation for Jupyter: https://jupyter.org/documentation
- Documentation for Python: https://docs.python.org/3/
- Documentation for TensorFlow: https://www.tensorflow.org/learn

These projects make the intended audience for each part of the
documentation more explicit, as well as previewing some of the content in
each section.


Besides rewriting the current documentation to some extent, it would be
ideal to have a technical infrastructure that would allow more
contributions from the community. For example, if Jupyter Notebooks could
be submitted as-is as tutorials or How-Tos, this might create more
contributors and broaden the NumPy community.

Similarly, if people could download some of the documentation in Notebook
format, this would certainly mean people would use less outdated material
for learning NumPy.

It would also be interesting if the new structure for the documentation
makes translations easier.

Currently, the documentation for NumPy can be confusing, especially for
beginners. Our proposal is to reorganize the docs in the following

* For users:
- Absolute Beginners Tutorial
- main Tutorials section
- How To’s for common tasks with NumPy
- Reference Guide
- Explanations
- F2Py Guide
- Glossary

* For developers/contributors:
- Contributor’s Guide
- Building and extending the documentation
- Benchmarking
- NumPy Enhancement Proposals

* Meta information
- Reporting bugs
- Release Notes
- About NumPy
- License

References and Footnotes

[1] What nobody tells you about documentation.
[2] Preparing to Teach (from the Software Carpentry Instructor Training
[3] NumPy for absolute beginners Tutorial by Anne Bonner.


This document has been placed in the public domain.

Melissa Weber Mendonça
NumPy-Discussion mailing list

Reply via email to