Thanks Jorge for helping me to get across the need for a user guide. The
examples you used are exactly what I had in mind. It would be great if the
project had a user guide similar to tokio's. We could use this guide to
explain how to get started and some examples using the available crates
(Arrow, Arrow-Flight and Datafusion)

I think that in order to start with the guide, I could take the approach
you suggested using either doc_comment or rust-skeptic to write down md
files to sketch the user guide. This guide could be included in each of the
project folders. e.g. arrow, flight and datafusion. Although, I'm a bit
inclined to have a single user guide (placed at the rust folder) that will
include a reference to all the elements that are included in the rust arrow
folder. This way the guide could be planned in such a way that a new user
would start learning about arrow arrays and finish doing queries on data
loaded from CVS or parquet files.

Is this something that could interest you?

Regards,
Fernando

On Fri, Oct 16, 2020 at 10:32 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> I would like to thank Fernando for raising this concern here: I also think
> that we still do not put enough effort in the documentation :) I admit that
> when I started in the project, I also had that need and just had some time
> to go through the code.
>
> First, I find it useful to distinguish types of documentation:
>
> 1. API references
> 2. user guide / tutorial
>
> The distinction between the two being that the former covers detailed usage
> of a given function/struct/trait etc, while the latter covers usage of the
> library as a whole (e.g. how a function is used in combination with
> others). Virtually every
> <https://docs.djangoproject.com/en/3.1/intro/tutorial01/> mature
> <https://pandas.pydata.org/docs/user_guide/index.html> library
> <https://spark.apache.org/docs/latest/quick-start.html> or
> <https://www.tensorflow.org/tutorials/quickstart/beginner> framework
> <https://docs.python.org/3/tutorial/> has both, as they serve different
> but
> well defined use-cases.
>
> The canonical way of documenting an API in rust is via the `docs.rs`, that
> generates a format common for rust projects and has a *significant* benefit
> for both writers and readers (for Python users, auto-docs on steroids:
> auto-links to classes declarations, references, testing the examples are
> part of running the tests, the documentation is written next to the actual
> source code, reference to the source code on a single click). Rust users
> expect the API documentation to be in `docs.rs` and released as part of
> crate. I agree with @Andy that we should stick to docs.rs for this. While
> there is always room for improvement, we do have the basics in place.
>
> I think that Fernando is alluding to the fact that we do not have a user
> guide / tutorial, and I agree: we are missing one such as tokio
> <https://tokio.rs/tokio/tutorial>'s, SIMD
> <https://rust-lang.github.io/packed_simd/perf-guide/introduction.html>'s,
> Rocket <https://rocket.rs/v0.4/guide/>'s or rust's book
> <https://doc.rust-lang.org/book/>, that covers how to use the
> library/framework.
>
> The main challenge is to ensure that the guide does not get deprecated.
> Looking at what other rust libs are doing, Serde, Tokio and Rocket write
> their guides in markdown and test the code on their guides (here: tokio
> <https://github.com/tokio-rs/website/tree/master/doc-test>, Rocket
> <https://github.com/SergioBenitez/Rocket/tree/v0.4/site/tests>). Rocket
> use
> their own codegen to test the docs, tokio uses doc_comment
> <https://docs.rs/doc-comment/0.3.3/doc_comment/>
>
> > The point of this (small) crate is to allow you to add doc comments from
> macros or to test external markdown files' code blocks through rustdoc.
>
> and serde uses rust-skeptic <https://github.com/budziq/rust-skeptic>.
>
> Thus, one idea is to write the guide in markdown on each (arrow and
> datafusion) crate, run the examples there as part of the testing with
> doc_comment <https://docs.rs/doc-comment/0.3.3/doc_comment/> or
> rust-skeptic
> <https://github.com/budziq/rust-skeptic>, and include these on arrow's
> official documentation on build (we would need to depend on a third-party
> Sphinx extension <https://www.sphinx-doc.org/en/master/usage/markdown.html
> >
> for this).
>
> This way, we keep the examples up-to-date, and the style and location close
> to other implementation's documentation.
>
> Would this be an option?
>
> Best,
> Jorge
>
>
>
> On Sat, Oct 17, 2020 at 12:48 AM Fernando Herrera <
> fernando.j.herr...@gmail.com> wrote:
>
> > I understand the concern, especially with the project changing that
> > quickly. However, I haven't found a good material that I can use to learn
> > how to use the crate. I know that each module has a lot of tests (which
> I'm
> > thankful for) but going from one test case to the other doesn't work well
> > as learning material. It is a bit hard to find a starting point within
> the
> > project, especially if it's your first time seeing the code. Should one
> > start with the datatypes.rs or with the builder.rs?
> >
> > Also, I think it would help a lot to have a more relaxed approach (like
> > "learning rust with entirely too many lists") rather than a reference
> > approach (like the RTF). I see the RTF as something you use to find
> > references regarding the code, rather than a learning material I would
> use
> > to grasp what can be done with the crate. That's why I was suggesting a
> > book format, like the one that is used for Ballista. If you want a
> > reference material you can always have a look at the documentation
> created
> > within the crate.
> >
> > What do you think?
> >
> > @Andy Grove... is it possible to take part in your incoming presentation?
> >
> >
> > On Fri, Oct 16, 2020 at 5:23 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > >
> > > > We should be careful with the balance of content between the
> > Restructured
> > > > Text Format documentation and the documentation in the crate that
> gets
> > > > published to docs.rs though. The rustdoc documentation is
> unit-tested
> > to
> > > > ensure that it is always up to date and we will have to manually
> update
> > > the
> > > > RTF documentation for each release, and the project is still evolving
> > > > rather quickly.
> > >
> > >
> > > If rust offers this out of the box then that definitely seems
> preferable.
> > > At some point it would be nice to enable doctest [1] for all of our
> > > snippets in the main repo.
> > >
> > > [1] https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html
> > >
> > > On Fri, Oct 16, 2020 at 3:17 PM Andy Grove <andygrov...@gmail.com>
> > wrote:
> > >
> > > > I think that it would be great to produce this kind of content. I'm
> > > giving
> > > > a presentation on Arrow to my local Rust meetup (virtually) next week
> > and
> > > > these are similar to the topics I will be covering there.
> > > >
> > > > We should be careful with the balance of content between the
> > Restructured
> > > > Text Format documentation and the documentation in the crate that
> gets
> > > > published to docs.rs though. The rustdoc documentation is
> unit-tested
> > to
> > > > ensure that it is always up to date and we will have to manually
> update
> > > the
> > > > RTF documentation for each release, and the project is still evolving
> > > > rather quickly.
> > > >
> > > > If the sample code included in RTF also exists as examples in the
> repo
> > > that
> > > > get tested then we can just copy and paste the contents over each
> time
> > we
> > > > release perhaps.
> > > >
> > > > Andy.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 16, 2020 at 3:59 PM Micah Kornfield <
> emkornfi...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Java and C++ have tutorials in Restructured Text Format in the docs
> > > > folder
> > > > > [1].  I think creating something similar for Rust might be the best
> > > place
> > > > > to start.  These are rendered on the website.  For example Java is
> > > > located
> > > > > at [2].
> > > > >
> > > > >
> > > > > [1] https://github.com/apache/arrow/tree/master/docs/source
> > > > > [2] https://arrow.apache.org/docs/java/index.html
> > > > >
> > > > > On Fri, Oct 16, 2020 at 2:48 PM Fernando Herrera <
> > > > > fernando.j.herr...@gmail.com> wrote:
> > > > >
> > > > > > I was working on the blog post I mentioned before regarding Arrow
> > > usage
> > > > > > (rust) and how to use the different elements available in the
> > create.
> > > > > After
> > > > > > some thought, these were the topics I want to include:
> > > > > >
> > > > > >    1. Arrays examples and how they look like
> > > > > >    Basic arrays and nested arrays
> > > > > >    The buffer structure and how data is stored
> > > > > >    Builders usage
> > > > > >    Examples of complex arrays and how to construct them (using
> > > builders
> > > > > and
> > > > > >    from)
> > > > > >    2. What is a record batch?
> > > > > >    How to construct a record batch
> > > > > >    How a RecordBatch is used with IPC
> > > > > >    3. How to read files?
> > > > > >    CSV files and Parquet files
> > > > > >    4. How to share information
> > > > > >    What is Arrow flight?
> > > > > >    How to set up a server with Rust
> > > > > >    Examples
> > > > > >    5. How to query information from arrays?
> > > > > >    Datafusion examples
> > > > > >
> > > > > > However, as I was working on the examples
> > > > > > <
> > > https://github.com/elferherrera/test_example/blob/master/src/main.rs>
> > > > > > that
> > > > > > I was planning to use (most of them came from the Arrow
> > repository) I
> > > > > > thought that the best format would be a book, something similar
> to
> > > the
> > > > > Rust
> > > > > > book. I think this format will help us to fully explain how each
> > > > > > constructor can be used in detail and how each of the data arrays
> > can
> > > > be
> > > > > > used and manipulated.
> > > > > >
> > > > > > What do you think about it?
> > > > > >
> > > > > > I could start the book using the examples in the repository and
> the
> > > > tests
> > > > > > done as a base. However, I cannot find a quick tutorial on
> setting
> > > up a
> > > > > > book like that, let alone how to host it. I know it has to be
> made
> > > > using
> > > > > > .md files, but that's as far as I have got. Can somebody give me
> a
> > > > > pointer
> > > > > > on setting up something like that?
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > On Thu, Oct 15, 2020 at 3:18 PM Mark Farnan <m...@markfarnan.com
> >
> > > > wrote:
> > > > > >
> > > > > > > I would agree with this.
> > > > > > >
> > > > > > > I’ve been working with the GO Arrow library last few weeks, and
> > > took
> > > > a
> > > > > > > while to get head around it all / how to use etc.
> > > > > > > Even then not sure i’ve got it right.
> > > > > > >
> > > > > > > Usage examples would be great.
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Mark
> > > > > > >
> > > > > > > > On Oct 14, 2020, at 4:08 PM, Fernando Herrera <
> > > > > > > fernando.j.herr...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > I was wondering if besides this blog post there should be
> > another
> > > > on
> > > > > > with
> > > > > > > > an example of usage. I think that is one of the key things
> > > missing
> > > > > for
> > > > > > > > Arrow in general. This example should show the problems that
> > > Arrow
> > > > is
> > > > > > > > solving and how to implement the solution in real life.
> > > > > > > >
> > > > > > > > On Tue, Oct 13, 2020 at 10:12 AM Andy Grove <
> > > andygrov...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> There has been a huge amount of activity in the Rust
> > subproject
> > > > for
> > > > > > the
> > > > > > > >> 2.0.0 release and I think that we should write a
> Rust-specific
> > > > blog
> > > > > > > post to
> > > > > > > >> go on the Arrow blog.
> > > > > > > >>
> > > > > > > >> I made a brief start at a Google doc, which is mostly just
> > > bullet
> > > > > > points
> > > > > > > >> listing some things we could talk about. I'm sure I've
> missed
> > > some
> > > > > > > things,
> > > > > > > >> and maybe we have too many things to talk about so we might
> > want
> > > > to
> > > > > > try
> > > > > > > and
> > > > > > > >> summarize some of this.
> > > > > > > >>
> > > > > > > >> Here is the doc ... I would appreciate any help anyone can
> > > provide
> > > > > > with
> > > > > > > >> this. Perhaps if each contributor could flesh out the
> content
> > > > around
> > > > > > > things
> > > > > > > >> they directly worked on or are knowledgeable about, that
> would
> > > be
> > > > > > great.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1RY7oa7ldi4RnyFzk3_5NHiiQl7IcvZgXFq3FYr5iwFc/edit?usp=sharing
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Andy.
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to