Hi Dian, Thank you for sharing your thoughts. What do you propose going forward? I am not sure I got this from your email.
Best, Konstantin On Wed, Mar 23, 2022 at 10:03 AM Dian Fu <dian0511...@gmail.com> wrote: > Hi Konstantin, > > Thanks a lot for bringing up this discussion. > > Currently, the Python documentation is more like a mixture of Option 1 and > Option 2. It contains two parts: > 1) The first part is the independent page [1] which could be seen as the > main entrypoint for Python users. > 2) The second part is the Python tabs which are among the DataStream API / > Table API pages. > > The motivation to provide an independent page for Python documentation is > as follows: > 1) We are trying to create a Pythonic documentation for Python users (we > are still far away from that and I have received much feedback saying that > the Python documentation and API is too Java-like). However, to avoid > duplication, it will link to the DataStream API / Table API pages when > necessary instead of copying content. There are indeed exceptions, e.g. the > window example given by Jark, that's because it only provides a very > limited window support in Python DataStream API at present and to give > Python users a complete picture of what they can do in Python DataStream > API, we have added a dedicated page. We are trying to finalize the window > support in 1.16 [2] and remove the duplicate documentation. > 2) There are some kinds of documentations which are only applicable for > Python language, e.g. dependency management[2], conversion between Table > and Pandas DataFrame [3], etc. Providing an independent page helps to > provide a place to hold all these kinds of documentation together. > > Regarding Option 1: "Language Tabs", this makes it hard to create Pythonic > documentation for Python users. > Regarding Option 2: "Language First", it may mean a lot of duplications. > Currently, there are a lot of descriptions in the DataStream API / Table > API pages which are shared between Java/Scala/Python. > > > In the rest of the documentation, Python is sometimes > > included like in this Table API page [2] and sometimes ignored like on > the > > project setup pages [3]. > I agree that this is something that we need to improve. > > Regards, > Dian > > [1] > > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/ > [2] https://issues.apache.org/jira/browse/FLINK-26477 > [2] > > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/ > [3] > > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/ > > On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <imj...@gmail.com> wrote: > > > Hi Konstantin, > > > > Thanks for starting this discussion. > > > > From my perspective, I prefer the "Language Tabs" approach. > > But maybe we can improve the tabs to move to the sidebar or top menu, > > which allows users to first decide on their language and then the API. > > IMO, programming languages are just like spoken languages which can be > > picked in the sidebar. > > What I want to avoid is the duplicate docs and in-complete features in a > > specific language. > > "Language First" may confuse users about what is and where to find the > > complete features provided by flink. > > > > For example, there are a lot of duplications in the "Window" pages[1] and > > "Python Window" pages[2]. > > And users can't have a complete overview of Flink's window mechanism from > > the Python API part. > > Users have to go through the Java/Scala DataStream API first to build the > > overall knowledge, > > and then to read the Python API part. > > > > > * Second, most of the Flink Documentation currently is using a > "Language > > Tabs" approach, but this might become obsolete in the long-term anyway as > > we move more and more in a Scala-free direction. > > > > The Scala-free direction means users can pick arbitrary Scala versions, > not > > drop the Scala API. > > So the "Language Tabs" is still necessary and helpful for switching > > languages. > > > > Best, > > Jark > > > > [1]: > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/ > > [2]: > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ > > > > > > > > > > > > > > > > On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <kna...@apache.org> > wrote: > > > > > Hi everyone, > > > > > > I would like to discuss a particular aspect of our documentation: the > > > top-level structure with respect to languages and APIs. The current > > > structure is inconsistent and the direction is unclear to me, which > makes > > > it hard for me to contribute gradual improvements. > > > > > > Currently, the Python documentation has its own independent branch in > the > > > documentation [1]. In the rest of the documentation, Python is > sometimes > > > included like in this Table API page [2] and sometimes ignored like on > > the > > > project setup pages [3]. Scala and Java on the other hand are always > > > documented in parallel next to each other in tabs. > > > > > > The way I see it, most parts (application development, connectors, > > getting > > > started, project setup) of our documentation have two primary > dimensions: > > > API (DataStream, Table API), Language (Python, Java, Scala) > > > > > > In addition, there is SQL, for which the language is only a minor > factor > > > (UDFs), but which generally requires a different structure (different > > > audience, different tools). On the other hand, SQL and Table API have > > some > > > conceptual overlap, whereas I doubt these concepts are of big interest > > > to SQL users. So, to me SQL should be treated separately in any case > with > > > links to the Table API documentation for some concepts. > > > > > > I think, in general, both approaches can work: > > > > > > > > > *Option 1: "Language Tabs"* > > > Application Development > > > > DataStream API (Java, Scala, Python) > > > > Table API (Java, Scala, Python) > > > > SQL > > > > > > > > > *Option 2: "Language First" * > > > Java Development Guide > > > > Getting Started > > > > DataStream API > > > > Table API > > > Python Development Guide > > > > Getting Started > > > > Datastream API > > > > Table API > > > SQL Development Guide > > > > > > I don't have a strong opinion on this, but tend towards "Language > First". > > > > > > * First, I assume, users actually first decide on their language/tools > of > > > choice and then move on to the API. > > > > > > * Second, most of the Flink Documentation currently is using a > "Language > > > Tabs" approach, but this might become obsolete in the long-term anyway > as > > > we move more and more in a Scala-free direction. > > > > > > For the connectors, I think, there is a good argument for "Language & > API > > > Embedded", because documenting every connector for each API and > language > > > separately would result in a lot of duplication. Here, I would go one > > step > > > further then what we have right now and target > > > > > > Connectors > > > -> Kafka (All APIs incl. SQL, All Languages) > > > -> Kinesis (same) > > > -> ... > > > > > > This also results in a quick overview for users about which connectors > > > exist and plays well with our plan of externalizing connectors. > > > > > > For completeness & scope of the discussion: there are two outdated > FLIPs > > on > > > documentation (42, 60), which both have not been implemented, are > > partially > > > contradicting each other and are generally out-of-date. I specifically > > > don't intend to add another FLIP to this graveyard, but still reach a > > > consensus on the high-level direction. > > > > > > What do you think? > > > > > > Cheers, > > > > > > Konstantin > > > > > > -- > > > > > > Konstantin Knauf > > > > > > https://twitter.com/snntrable > > > > > > https://github.com/knaufk > > > > > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk