Hi everyone,

Thanks for bringing this up for discussion Konstantin. I do think we are
currently not serving Flink users proper documentation, which hurts Flink's
adoption and results in unnecessary opened Jira tickets, Stackoverflow
posts and email threads which in the end also cause a strain on the
community.

I've done some looking around how others are solving this problem. For
example, Beam supports Java, Python and Go. It appears that they've gone
for your "Language First" approach [1]. This also shows in their
Quickstarts [2]. However, they have abstracted a lot of information into
Generals [3] and therefore their language first pages are quite small.

I've also looked into Matomo to see how users are transitioning from
the documentation page for 1.14 towards their next page. I'm excluding
transition to Chinese language and other Flink versions and most visitors
then go to the DataStream overview page. Of course we don't have insights
into how many people are clicking on a different language tab when
presented with such an option, but for me it does imply that users are
selecting the first option that we present to them under Application
Development.

I think in the end, users are searching for a solution (Application
Development) for their use case. It's up to us to help them best to choose
a solution for that.

I think the golden rule should be that duplicated content should be
avoided. Those should either be generalized for Flink as a whole (because
it doesn't matter if you use them in either DataStream, Table, SQL or
Python) or if they are language agnostic (similar for
Java/Scala/SQL/Python) and referred to by detail pages.

In the end, I think language tabs are the best way to avoid duplicate
content only because we have multiple ways to develop applications. That
makes generalization more complex / less possible.

It would be nice (like Jark suggested) having the option to select a
language and then all the language tabs by default switch to that option.
That would remove users' friction (and also could give us insight into what
is the most used language by Flink users if we would track it in Matomo).

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82

[1] https://beam.apache.org/documentation/sdks/java/
[2] https://beam.apache.org/get-started/quickstart-java/
[3] https://beam.apache.org/documentation/


On Wed, 23 Mar 2022 at 10:18, Dian Fu <dian0511...@gmail.com> wrote:

> To summarize, I tend to Option 2 "Language First" in case we could find a
> way to eliminate documentation duplication.
>
> On Wed, Mar 23, 2022 at 5:02 PM Dian Fu <dian0511...@gmail.com> wrote:
>
> > Hi Konstantin,
> >
> > Thanks a lot for bringing up this discussion.
> >
> > Currently, the Python documentation is more like a mixture of Option 1
> and
> > Option 2. It contains two parts:
> > 1) The first part is the independent page [1] which could be seen as the
> > main entrypoint for Python users.
> > 2) The second part is the Python tabs which are among the DataStream API
> /
> > Table API pages.
> >
> > The motivation to provide an independent page for Python documentation is
> > as follows:
> > 1) We are trying to create a Pythonic documentation for Python users (we
> > are still far away from that and I have received much feedback saying
> that
> > the Python documentation and API is too Java-like). However, to avoid
> > duplication, it will link to the DataStream API / Table API pages when
> > necessary instead of copying content. There are indeed exceptions, e.g.
> the
> > window example given by Jark, that's because it only provides a very
> > limited window support in Python DataStream API at present and to give
> > Python users a complete picture of what they can do in Python DataStream
> > API, we have added a dedicated page. We are trying to finalize the window
> > support in 1.16 [2] and remove the duplicate documentation.
> > 2) There are some kinds of documentations which are only applicable for
> > Python language, e.g. dependency management[2], conversion between Table
> > and Pandas DataFrame [3], etc. Providing an independent page helps to
> > provide a place to hold all these kinds of documentation together.
> >
> > Regarding Option 1: "Language Tabs", this makes it hard to create
> Pythonic
> > documentation for Python users.
> > Regarding Option 2: "Language First", it may mean a lot of duplications.
> > Currently, there are a lot of descriptions in the DataStream API / Table
> > API pages which are shared between Java/Scala/Python.
> >
> > > In the rest of the documentation, Python is sometimes
> > > included like in this Table API page [2] and sometimes ignored like on
> > the
> > > project setup pages [3].
> > I agree that this is something that we need to improve.
> >
> > Regards,
> > Dian
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/
> > [2] https://issues.apache.org/jira/browse/FLINK-26477
> > [2]
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/
> > [3]
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/
> >
> > On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <imj...@gmail.com> wrote:
> >
> >> Hi Konstantin,
> >>
> >> Thanks for starting this discussion.
> >>
> >> From my perspective, I prefer the "Language Tabs" approach.
> >> But maybe we can improve the tabs to move to the sidebar or top menu,
> >> which allows users to first decide on their language and then the API.
> >> IMO, programming languages are just like spoken languages which can be
> >> picked in the sidebar.
> >> What I want to avoid is the duplicate docs and in-complete features in a
> >> specific language.
> >> "Language First" may confuse users about what is and where to find the
> >> complete features provided by flink.
> >>
> >> For example, there are a lot of duplications in the "Window" pages[1]
> and
> >> "Python Window" pages[2].
> >> And users can't have a complete overview of Flink's window mechanism
> from
> >> the Python API part.
> >> Users have to go through the Java/Scala DataStream API first to build
> the
> >> overall knowledge,
> >> and then to read the Python API part.
> >>
> >> > * Second, most of the Flink Documentation currently is using a
> "Language
> >> Tabs" approach, but this might become obsolete in the long-term anyway
> as
> >> we move more and more in a Scala-free direction.
> >>
> >> The Scala-free direction means users can pick arbitrary Scala versions,
> >> not
> >> drop the Scala API.
> >> So the "Language Tabs" is still necessary and helpful for switching
> >> languages.
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/
> >> [2]:
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <kna...@apache.org>
> wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > I would like to discuss a particular aspect of our documentation: the
> >> > top-level structure with respect to languages and APIs. The current
> >> > structure is inconsistent and the direction is unclear to me, which
> >> makes
> >> > it hard for me to contribute gradual improvements.
> >> >
> >> > Currently, the Python documentation has its own independent branch in
> >> the
> >> > documentation [1]. In the rest of the documentation, Python is
> sometimes
> >> > included like in this Table API page [2] and sometimes ignored like on
> >> the
> >> > project setup pages [3]. Scala and Java on the other hand are always
> >> > documented in parallel next to each other in tabs.
> >> >
> >> > The way I see it, most parts (application development, connectors,
> >> getting
> >> > started, project setup) of our documentation have two primary
> >> dimensions:
> >> > API (DataStream, Table API), Language (Python, Java, Scala)
> >> >
> >> > In addition, there is SQL, for which the language is only a minor
> factor
> >> > (UDFs), but which generally requires a different structure (different
> >> > audience, different tools). On the other hand, SQL and Table API have
> >> some
> >> > conceptual overlap, whereas I doubt these concepts are of big interest
> >> > to SQL users. So, to me SQL should be treated separately in any case
> >> with
> >> > links to the Table API documentation for some concepts.
> >> >
> >> > I think, in general, both approaches can work:
> >> >
> >> >
> >> > *Option 1: "Language Tabs"*
> >> > Application Development
> >> > > DataStream API  (Java, Scala, Python)
> >> > > Table API (Java, Scala, Python)
> >> > > SQL
> >> >
> >> >
> >> > *Option 2: "Language First" *
> >> > Java Development Guide
> >> > > Getting Started
> >> > > DataStream API
> >> > > Table API
> >> > Python Development Guide
> >> > > Getting Started
> >> > > Datastream API
> >> > > Table API
> >> > SQL Development Guide
> >> >
> >> > I don't have a strong opinion on this, but tend towards "Language
> >> First".
> >> >
> >> > * First, I assume, users actually first decide on their language/tools
> >> of
> >> > choice and then move on to the API.
> >> >
> >> > * Second, most of the Flink Documentation currently is using a
> "Language
> >> > Tabs" approach, but this might become obsolete in the long-term anyway
> >> as
> >> > we move more and more in a Scala-free direction.
> >> >
> >> > For the connectors, I think, there is a good argument for "Language &
> >> API
> >> > Embedded", because documenting every connector for each API and
> language
> >> > separately would result in a lot of duplication. Here, I would go one
> >> step
> >> > further then what we have right now and target
> >> >
> >> > Connectors
> >> > -> Kafka (All APIs incl. SQL, All Languages)
> >> > -> Kinesis (same)
> >> > -> ...
> >> >
> >> > This also results in a quick overview for users about which connectors
> >> > exist and plays well with our plan of externalizing connectors.
> >> >
> >> > For completeness & scope of the discussion: there are two outdated
> >> FLIPs on
> >> > documentation (42, 60), which both have not been implemented, are
> >> partially
> >> > contradicting each other and are generally out-of-date. I specifically
> >> > don't intend to add another FLIP to this graveyard, but still reach a
> >> > consensus on the high-level direction.
> >> >
> >> > What do you think?
> >> >
> >> > Cheers,
> >> >
> >> > Konstantin
> >> >
> >> > --
> >> >
> >> > Konstantin Knauf
> >> >
> >> > https://twitter.com/snntrable
> >> >
> >> > https://github.com/knaufk
> >> >
> >>
> >
>

Reply via email to