[eeps] EEP XXX: Documentation storage and format

José Valim Sat, 06 Jan 2018 00:11:44 -0800

This EEP proposes an official API documentation storage to be used by
by BEAM languages.  By standardizing how API documentation is stored,
it will be possible to write tools that work across languages.


I want to thank Eric and Radek, who co-authored the proposal, as well as
Kenneth, Fred, Tristan and Loïc for their feedback.

See the attached document.

*José Valimwww.plataformatec.com.br
<http://www.plataformatec.com.br/>Founder and Director of R&D*

    Author: Eric Bailey, Radek Szymczyszyn, José Valim
    Status: Draft
    Type: Standards Track
    Created: 04-Jan-2018
    Post-History:
****
EEP XX: Documentation storage and format
----



Abstract
========

This EEP proposes an official API documentation storage to be used by
by BEAM languages.  By standardizing how API documentation is stored,
it will be possible to write tools that work across languages.



Rationale
=========

Currently, different programming languages and libraries running on
BEAM devise their own schemas for storing and accessing documentation.  
For example, Elixir and LFE provide a `h` helper in their shell that
can print the documentation of any module:

    iex> h String
    A String in Elixir is a UTF-8 encoded binary.

However, Elixir is only able to show docs for Elixir modules.  LFE is
only able to show docs for LFE functions and so on.  If documentation
is standardized, such features can be easily added to other languages
in a way that works consistently across all BEAM languages.

Furthermore, each language ends up building their own tools for
generating, processing and converting documentation.  We hope a unified
approach to documentation will improve the compatibility between tools.
For instance, an Erlang IDE will be able to show inline documentation
for any module and function, regardless if the function is part of OTP,
a library or even written in Elixir, LFE or Alpaca.

**Note**: in this document, the word “documentation” refers exclusively
to the API documentation of modules and functions.  Guides, tutorials
and others materials are also essential to projects but not the focus
of this EEP.

**Note**: This EEP is not about documentation format.  It is about a
mechanism for storing documentation to make it easier to produce other
formats.  For example, a tool can read the documentation and produce man
pages from it.



Specification
=============

This EEP is divided in three parts.  The first defines the two
places the documentation can be stored, the second defines the shape of
the documentation and the third discusses integration with OTP.


## Part 1: the "Docs"storage ##

There are two main mechanisms in which BEAM languages store documentation:
in the filesystem (usually in the /doc directory) and inside ".beam"
files. 

This EEP recognizes both options and aim to support both.  To look for
documentation for a module name "example", a tool should:

  1. Look for "example.beam" in the code path, parse the BEAM file and
     retrieve the "Docs" chunk

  2. If the chunk is not available, it should look for "example.beam"
     in the code path and find the "doc/chunks/example.chunk" file in
     the application that defines the "example" module

  3. If a ".chunk" file is not available, then documentation is not
     available

The choice of using a chunk or the filesystem is completely up to the
language or library.  In both cases, the documentation can be added or
removed at any moment by stripping the "Docs" chunk or by removing the
"doc/chunks" directory.

For example, languages like Elixir and LFE attach the "Docs" chunk at
compilation time, which can be controlled via a compiler flag.  On the
other hand, projects like OTP itself will likely generate the "doc/chunks"
entries on a separate command, completely unrelated from code compilation.


## Part 2: the "Docs" format ##


In both storages, the documentation is written in the exactly same
format: an Erlang term serialized to binary via term_to_binary/1.  The
term may be optionally compressed when serialized and must follow the
type specification below:

    {docs_v1,
     Anno :: erl_anno:anno(),
     Language :: atom(),
     Format :: mime_type(),
     ModuleDoc :: binary() | none | hidden,
     Metadata :: map(),
     Docs ::
       [{{Kind, Name, Arity},
         Anno :: erl_anno:anno(),
         Signature :: [binary()],
         Doc :: binary() | none | hidden,
         Metadata :: map()
        }]}

where in the root tuple we have:

  * `Anno` - annotation (line, column, file) of the module documentation
    or of the definition itself (see erl_anno)

  * `Language` - an atom representing the language, for example:
    "erlang", "elixir", "lfe", "alpaca", etc

  * `Format` - the mime type of the documentation, such as "text/markdown"
    (see the FAQ for a discussion on this field)

  * `ModuleDoc` - a binary with the documentation or the atom "none"
    in case there is no documentation or the atom "hidden" if
    documentation has been explicitly disabled for this entry

  * `Metadata` - a map of atom keys with any term as value.  This can be
    used to add annotations like the "authors" of a module, "deprecated",
    or anything else a language or documentation tool may find relevant

  * `Docs` - a list of documentation for other entities (such as
    functions and types) in the module

For each entry in `Docs`, we have:

  * `{Kind, Name, Arity}` - the kind, name and arity identifying the
    function, callback, type, etc.  The official entities are: `function`,
    `type` and `callback`.  Other languages will add their own. For
    instance, Elixir and LFE may add `macro`

  * `Anno` - annotation (line, column, file) of the module documentation
    or of the definition itself (see erl_anno)

  * `Signature` - the signature of the entity.  It is is a list of
    binaries. Each entry represents a binary in the signature that can
    be joined with a whitespace or a newline.  For example,
    `["binary_to_atom(Binary, Encoding)", "when is_binary(Binary)"]`
    may be rendered as as a single line or two lines. It exists
    exclusively for exhibition purposes

  * `Doc` - a binary with the documentation or the atom "none"
    in case there is no documentation or the atom "hidden" if
    documentation has been explicitly disabled for this entry

  * `Metadata` - a map of atom keys with any term as value

This shared format is the heart of the EEP as it is what effectively
allows cross-language collaboration.

The `Metadata` field exists to allow languages, tools and libraries to
add custom information to each entry.  This EEP documents the
following metadata keys:

  * `authors := [binary()]` - a list of authors as binaries

  * `cross_references := [module() | {module(), {Kind, Name, Arity}}]` -
    a list of modules or module entries that can be used as cross
    references when generating documentation

  * `deprecated := binary()` - when present, it means the current entry
    is deprecated with a binary that represents the reason for
    deprecation and a recommendation to replace the deprecated code

  * `since := binary()` - a binary representing the version such entry
    was added, such as "1.3.0" or "20.0"

Any key may be added to Metadata at any time.  Keys that are frequently
used by the community can be standardized in future versions. 


## Part 3: Integration with OTP ##

The last part focuses on integrating the previous parts with OTP docs,
tools and workflows.  The items below are suggestions and are not
necessary for the adoption of this EEP, neither by OTP nor by any other
language or library.

At this point we should consider changes to OTP such as:

  * Distributing the `doc/chunks/*.chunk` files as part of OTP and
    changing the tools that ship with OTP to rely on them. For example,
    "erl -man lists" could be changed to locate the "lists.chunk" file,
    parsing the documentation out and then converting it to a man page
    on the fly.  This task may require multiple changes, as OTP stores
    documentation on XML files as well as directly in the source code.
    `edoc` itself should likely be augmented with functions that spit
    out `.chunk` files from the source code

  * Adding `h(Module)`, `h(Module, Function, Arity)`, and similar to
    Erlang’s shell to print the documentation of a module or of a
    given function and arity. This should be able to print docs any
    other library or language that implements this proposal



FAQ
===

*Q: Why do we have a Format entry in the documentation?*

The main trade-off in the proposal is the documentation format.  We have
two options:

  * Allow each language/library/tool to choose their own documentation
    format
  * Impose a unified documentation format on all languages

A unified format for documentation gives no flexibility to languages and
libraries in choosing how documentation is written.  As the ecosystem
gets more diverse, it will be unlikely to find a format that suits all.
For this reason we introduced a Format field that allows each language
and library to pick their documentation format.  The downside is that,
if the Elixir docs are written in Markdown and a language does not know
how to format Markdown, then the language will have to choose to either
not show the Elixir docs or show them raw (i.e. in Markdown).

Erlang is in a privileged position.  All languages will be able to
support whatever format is chosen for Erlang since all languages run on
Erlang and will have direct access to Erlang's tooling.

*Q: If I have an Erlang/Elixir/LFE/Alpaca library that uses a custom
documentation toolkit, will I also be able to leverage this?*

As long as the documentation ends up up in the "Docs" chunk or inside
the `doc/chunks` directory, we absolutely do not care how the
documentation was originally written.  If you use a custom format,
you may need to teach your language of choice how to render it though.
See the previous question.



Copyright
=========

This document has been placed in the public domain.



[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "

_______________________________________________
eeps mailing list
[email protected]
http://erlang.org/mailman/listinfo/eeps

[eeps] EEP XXX: Documentation storage and format

Reply via email to