Re: [DISCUSS] Structure of the Flink Documentation (Languages & APIs)

Timo Walther Wed, 23 Mar 2022 03:10:24 -0700

Hi Konstantin,

thank you for starting this discussion again. It is a pitty that we havenever implemented FLIP-60 because it was the result of long offlinediscussions with a lot of people closer working on the documentation topic.

I think splitting the documentation into *Option 1: "Language Tabs"* vs.*Option 2: "Language First"* might oversimplify the topic a bit. Weshould structure the documentation more use case driven and in any casetry to avoid text duplication because the past has shown that this willmake writing and updating documentation very painful. Everydocumentation paragraph written should have a clear location. I'm happyto help in coming up with a new documentation structure.

Let me propose: *Option 3: Concepts with tabs, API+Language separate,Operators with tabs within ecosystem*


This basically solves also the comments that both Jark and Dian mentioned:

> there are a lot of duplications in the "Window" pages

> users can't have a complete overview of Flink's window mechanism fromthe Python API part> Scala-free direction means users can pick arbitrary Scala versions,not drop the Scala API


Let me try to explain all three components of Option 3 briefly:

Concepts with tabs:

Not ever API needs to explain watermarks, event-time, or checkpointing.The concept page can explain those concepts either withpictures/diagrams only. Or slightly better, show little basic exampleshow to define a watermark in every API+language (e.g. Python Table API)using tabs, just to get the concept. The full explanation how to declarea watermark assigner and all available assigners is than API+languagespecific and not part of this section.


API+Language separate:

We offer sections such as "Python Table API", "Scala Table API", "JavaTable API", "SQL". Those sections give an overview of the API, how topackage and submit jobs. E.g. for Java Table API they explain how to useTableEnvironment or implement Java UDFs. They don't go into operatordetails.


Operators with tabs within ecosystem:

For Table API & SQL in any language flavor we explain one operator perpage. E.g. we offer explanation about regular, temporal, or look upjoins. This is almost like a concept section and shared by allAPI+Languages. While explaning the concepts we offer tabs or thetop-level setting as Jark suggested.


Regards,
Timo


Am 23.03.22 um 10:17 schrieb Dian Fu:

To summarize, I tend to Option 2 "Language First" in case we could find a
way to eliminate documentation duplication.

On Wed, Mar 23, 2022 at 5:02 PM Dian Fu <dian0511...@gmail.com> wrote:

Hi Konstantin,

Thanks a lot for bringing up this discussion.

Currently, the Python documentation is more like a mixture of Option 1 and
Option 2. It contains two parts:
1) The first part is the independent page [1] which could be seen as the
main entrypoint for Python users.
2) The second part is the Python tabs which are among the DataStream API /
Table API pages.

The motivation to provide an independent page for Python documentation is
as follows:
1) We are trying to create a Pythonic documentation for Python users (we
are still far away from that and I have received much feedback saying that
the Python documentation and API is too Java-like). However, to avoid
duplication, it will link to the DataStream API / Table API pages when
necessary instead of copying content. There are indeed exceptions, e.g. the
window example given by Jark, that's because it only provides a very
limited window support in Python DataStream API at present and to give
Python users a complete picture of what they can do in Python DataStream
API, we have added a dedicated page. We are trying to finalize the window
support in 1.16 [2] and remove the duplicate documentation.
2) There are some kinds of documentations which are only applicable for
Python language, e.g. dependency management[2], conversion between Table
and Pandas DataFrame [3], etc. Providing an independent page helps to
provide a place to hold all these kinds of documentation together.

Regarding Option 1: "Language Tabs", this makes it hard to create Pythonic
documentation for Python users.
Regarding Option 2: "Language First", it may mean a lot of duplications.
Currently, there are a lot of descriptions in the DataStream API / Table
API pages which are shared between Java/Scala/Python.

In the rest of the documentation, Python is sometimes
included like in this Table API page [2] and sometimes ignored like on

the

project setup pages [3].

I agree that this is something that we need to improve.

Regards,
Dian

[1]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/
[2] https://issues.apache.org/jira/browse/FLINK-26477
[2]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/
[3]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/

On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <imj...@gmail.com> wrote:

Hi Konstantin,

Thanks for starting this discussion.

 From my perspective, I prefer the "Language Tabs" approach.
But maybe we can improve the tabs to move to the sidebar or top menu,
which allows users to first decide on their language and then the API.
IMO, programming languages are just like spoken languages which can be
picked in the sidebar.
What I want to avoid is the duplicate docs and in-complete features in a
specific language.
"Language First" may confuse users about what is and where to find the
complete features provided by flink.

For example, there are a lot of duplications in the "Window" pages[1] and
"Python Window" pages[2].
And users can't have a complete overview of Flink's window mechanism from
the Python API part.
Users have to go through the Java/Scala DataStream API first to build the
overall knowledge,
and then to read the Python API part.

* Second, most of the Flink Documentation currently is using a "Language

Tabs" approach, but this might become obsolete in the long-term anyway as
we move more and more in a Scala-free direction.

The Scala-free direction means users can pick arbitrary Scala versions,
not
drop the Scala API.
So the "Language Tabs" is still necessary and helpful for switching
languages.

Best,
Jark

[1]:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/
[2]:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/







On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <kna...@apache.org> wrote:

Hi everyone,

I would like to discuss a particular aspect of our documentation: the
top-level structure with respect to languages and APIs. The current
structure is inconsistent and the direction is unclear to me, which

makes

it hard for me to contribute gradual improvements.

Currently, the Python documentation has its own independent branch in

the

documentation [1]. In the rest of the documentation, Python is sometimes
included like in this Table API page [2] and sometimes ignored like on

the

project setup pages [3]. Scala and Java on the other hand are always
documented in parallel next to each other in tabs.

The way I see it, most parts (application development, connectors,

getting

started, project setup) of our documentation have two primary

dimensions:

API (DataStream, Table API), Language (Python, Java, Scala)

In addition, there is SQL, for which the language is only a minor factor
(UDFs), but which generally requires a different structure (different
audience, different tools). On the other hand, SQL and Table API have

some

conceptual overlap, whereas I doubt these concepts are of big interest
to SQL users. So, to me SQL should be treated separately in any case

with

links to the Table API documentation for some concepts.

I think, in general, both approaches can work:


*Option 1: "Language Tabs"*
Application Development

DataStream API  (Java, Scala, Python)
Table API (Java, Scala, Python)
SQL


*Option 2: "Language First" *
Java Development Guide

Getting Started
DataStream API
Table API

Python Development Guide

Getting Started
Datastream API
Table API

SQL Development Guide

I don't have a strong opinion on this, but tend towards "Language

First".

* First, I assume, users actually first decide on their language/tools

of

choice and then move on to the API.

* Second, most of the Flink Documentation currently is using a "Language
Tabs" approach, but this might become obsolete in the long-term anyway

as

we move more and more in a Scala-free direction.

For the connectors, I think, there is a good argument for "Language &

API

Embedded", because documenting every connector for each API and language
separately would result in a lot of duplication. Here, I would go one

step

further then what we have right now and target

Connectors
-> Kafka (All APIs incl. SQL, All Languages)
-> Kinesis (same)
-> ...

This also results in a quick overview for users about which connectors
exist and plays well with our plan of externalizing connectors.

For completeness & scope of the discussion: there are two outdated

FLIPs on

documentation (42, 60), which both have not been implemented, are

partially

contradicting each other and are generally out-of-date. I specifically
don't intend to add another FLIP to this graveyard, but still reach a
consensus on the high-level direction.

What do you think?

Cheers,

Konstantin

--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [DISCUSS] Structure of the Flink Documentation (Languages & APIs)

Reply via email to