I think this sounds good. +1 On Wed, Aug 5, 2020 at 8:37 PM jincheng sun <sunjincheng...@gmail.com> wrote:
> Hi David, Thank you for sharing the problems with the current document, > and I agree with you as I also got the same feedback from Chinese users. I > am often contacted by users to ask questions such as whether PyFlink > supports "Java UDF" and whether PyFlink supports "xxxConnector". The root > cause of these problems is that our existing documents are based on Java > users (text and API mixed part). Since Python is newly added from 1.9, many > document information is not friendly to Python users. They don't want to > look for Python content in unfamiliar Java documents. Just yesterday, there > were complaints from Chinese users about where is all the document entries > of Python API. So, have a centralized entry and clear document structure, > which is the urgent demand of Python users. The original intention of FLIP > is do our best to solve these user pain points. > > Hi Xingbo and Wei Thank you for sharing PySpark's status on document > optimization. You're right. PySpark already has a lot of Python user > groups. They also find that Python user community is an important position > for multilingual support. The centralization and unification of Python > document content will reduce the learning cost of Python users, and good > document structure and content will also reduce the Q & A burden of the > community, It's a once and for all job. > > Hi Seth, I wonder if your concerns have been resolved through the previous > discussion? > > Anyway, the principle of FLIP is that in python document should only > include Python specific content, instead of making a copy of the Java > content. And would be great to have you to join in the improvement for > PyFlink (Both PRs and Review PRs). > > Best, > Jincheng > > > Wei Zhong <weizhong0...@gmail.com> 于2020年8月5日周三 下午5:46写道: > >> Hi Xingbo, >> >> Thanks for your information. >> >> I think the PySpark's documentation redesigning deserves our attention. >> It seems that the Spark community has also begun to treat the user >> experience of Python documentation more seriously. We can continue to pay >> attention to the discussion and progress of the redesigning in the Spark >> community. It is so similar to our working that there should be some ideas >> worthy for us. >> >> Best, >> Wei >> >> >> 在 2020年8月5日,15:02,Xingbo Huang <hxbks...@gmail.com> 写道: >> >> Hi, >> >> I found that the spark community is also working on redesigning pyspark >> documentation[1] recently. Maybe we can compare the difference between our >> document structure and its document structure. >> >> [1] https://issues.apache.org/jira/browse/SPARK-31851 >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html >> >> Best, >> Xingbo >> >> David Anderson <da...@alpinegizmo.com> 于2020年8月5日周三 上午3:17写道: >> >>> I'm delighted to see energy going into improving the documentation. >>> >>> With the current documentation, I get a lot of questions that I believe >>> reflect two fundamental problems with what we currently provide: >>> >>> (1) We have a lot of contextual information in our heads about how Flink >>> works, and we are able to use that knowledge to make reasonable inferences >>> about how things (probably) work in cases we aren't so familiar with. For >>> example, I get a lot of questions of the form "If I use <this feature> will >>> I still have exactly once guarantees?" The answer is always yes, but they >>> continue to have doubts because we have failed to clearly communicate this >>> fundamental, underlying principle. >>> >>> This specific example about fault tolerance applies across all of the >>> Flink docs, but the general idea can also be applied to the Table/SQL and >>> PyFlink docs. The guiding principles underlying these APIs should be >>> written down in one easy-to-find place. >>> >>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?" >>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be >>> very difficult to answer because it is frequently the case that one has to >>> reason about why a given feature doesn't seem to appear in the >>> documentation. It could be that I'm looking in the wrong place, or it could >>> be that someone forgot to document something, or it could be that it can in >>> fact be done by applying a general mechanism in a specific way that I >>> haven't thought of -- as in this case, where one can use a JDBC sink from >>> Python if one thinks to use DDL. >>> >>> So I think it would be helpful to be explicit about both what is, and >>> what is not, supported in PyFlink. And to have some very clear organizing >>> principles in the documentation so that users can quickly learn where to >>> look for specific facts. >>> >>> Regards, >>> David >>> >>> >>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <sunjincheng...@gmail.com> >>> wrote: >>> >>>> Hi Seth and David, >>>> >>>> I'm very happy to have your reply and suggestions. I would like to >>>> share my thoughts here: >>>> >>>> The main motivation we want to refactor the PyFlink doc is that we want >>>> to make sure that the Python users could find all they want starting from >>>> the PyFlink documentation mainpage. That’s, the PyFlink documentation >>>> should have a catalogue which includes all the functionalities available in >>>> PyFlink. However, this doesn’t mean that we will make a copy of the content >>>> of the documentation in the other places. It may be just a reference/link >>>> to the other documentation if needed. For the documentation added under >>>> PyFlink mainpage, the principle is that it should only include Python >>>> specific content, instead of making a copy of the Java content. >>>> >>>> >> I'm concerned that this proposal duplicates a lot of content that >>>> will quickly get out of sync. It feels like it is documenting PyFlink >>>> separately from the rest of the project. >>>> >>>> Regarding the concerns about maintainability, as mentioned above, The >>>> goal of this FLIP is to provide an intelligible entrance of Python API, and >>>> the content in it should only contain the information which is useful for >>>> Python users. There are indeed many agenda items that duplicate the Java >>>> documents in this FLIP, but it doesn't mean the content would be copied >>>> from Java documentation. i.e, if the content of the document is the same as >>>> the corresponding Java document, we will add a link to the Java document. >>>> e.g. the "Built-in functions" and "SQL". We only create a page for the >>>> Python-only content, and then redirect to the Java document if there is >>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the >>>> document is Python-only and already exists, we will move it from the old >>>> python document to the new python document, e.g. "Configurations". If the >>>> document is Python-only and not exists before, we will create a new page >>>> for it. e.g. "DataTypes". >>>> >>>> The main reason we create a new page for Python Data Types is that it >>>> is only conceptually one-to-one correspondence with Java Data Types, but >>>> the actual document content would be very different from Java DataTypes. >>>> Some detailed difference are as following: >>>> >>>> >>>> - The text in the Java Data Types document is written for JVM-based >>>> language users, which is incomprehensible to users who only understand >>>> python. >>>> - Currently the Python Data Types does not support the "bridgedTo" >>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types. >>>> - The section "Planner Compatibility" and "Data Type Extraction" are >>>> only useful for Java/Scala users. >>>> - We want to add sections which may only apply for Python such as >>>> which Data Types are currently supported in Python, the mapping between >>>> DataType and Python object type, etc. >>>> >>>> I think the root cause of such a difference with existing documents is >>>> that, Python is the first non-JVM language we support in flink. This means >>>> our previous method of sharing documents between Java and Scala may not be >>>> suitable for Python. So we will adopt some very different methods to >>>> provide documentation for Python users. Of course, we should reduce >>>> maintenance costs as much as possible while ensuring user experience. >>>> Furthermore, python is the first step of flink multi-language support, and >>>> there may be R, Go, etc in future. it is very necessary for us to form main >>>> page for each language, so that users of each type of language can focus on >>>> the content which they care about. >>>> >>>> >> Things like the cookbook and tutorial should be under the Try Flink >>>> section of the documentation. >>>> >>>> Regarding the position of the "Cookbook" section, in my sense the "Try >>>> Flink" is for the new users and the "Cookbook" is for more advanced users, >>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello >>>> World” and In “Cookbook” we can add more use cases closer to production >>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to >>>> keep the current structure. >>>> >>>> >> it's relatively straightforward to compare the Python API with the >>>> Java and Scala versions. >>>> >>>> Regarding the comparison between Python API and Java/Scala API, I think >>>> the majority of users, especially the beginner users, would not have this >>>> demand. The priority of increasing user experience for beginner users seems >>>> higher than it from my side. Would you please add more inputs for why user >>>> want to compare? How much impact will the comparison be if we put it on >>>> multiple pages :) >>>> >>>> Thanks for all of your feedback and suggestions, any follow-up feedback >>>> is welcome. >>>> >>>> Best, >>>> Jincheng >>>> >>>> >>>> David Anderson <da...@alpinegizmo.com> 于2020年8月3日周一 下午10:49写道: >>>> >>>>> Jincheng, >>>>> >>>>> One thing that I like about the way that the documentation is >>>>> currently organized is that it's relatively straightforward to compare the >>>>> Python API with the Java and Scala versions. I'm concerned that if the >>>>> PyFlink docs are more independent, it will be challenging to respond to >>>>> questions about which features from the other APIs are available from >>>>> Python. >>>>> >>>>> David >>>>> >>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com> >>>>> wrote: >>>>> >>>>>> Would be great if you could join the contribution of PyFlink >>>>>> documentation @Marta ! >>>>>> Thanks for all of the positive feedback. I will start a formal vote >>>>>> then >>>>>> later... >>>>>> >>>>>> Best, >>>>>> Jincheng >>>>>> >>>>>> >>>>>> Shuiqiang Chen <acqua....@gmail.com> 于2020年8月3日周一 上午9:56写道: >>>>>> >>>>>> > Hi jincheng, >>>>>> > >>>>>> > Thanks for the discussion. +1 for the FLIP. >>>>>> > >>>>>> > A well-organized documentation will greatly improve the efficiency >>>>>> and >>>>>> > experience for developers. >>>>>> > >>>>>> > Best, >>>>>> > Shuiqiang >>>>>> > >>>>>> > Hequn Cheng <he...@apache.org> 于2020年8月1日周六 上午8:42写道: >>>>>> > >>>>>> >> Hi Jincheng, >>>>>> >> >>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP. >>>>>> >> >>>>>> >> I think this will bring big benefits for the PyFlink users. >>>>>> Currently, >>>>>> >> the Python TableAPI document is hidden deeply under the >>>>>> TableAPI&SQL tab >>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation >>>>>> is mixed >>>>>> >> with Java/Scala documentation. It is hard for users to have an >>>>>> overview of >>>>>> >> all the PyFlink documents. As more and more functionalities are >>>>>> added into >>>>>> >> PyFlink, I think it's time for us to refactor the document. >>>>>> >> >>>>>> >> Best, >>>>>> >> Hequn >>>>>> >> >>>>>> >> >>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira < >>>>>> ma...@ververica.com> >>>>>> >> wrote: >>>>>> >> >>>>>> >>> Hi, Jincheng! >>>>>> >>> >>>>>> >>> Thanks for creating this detailed FLIP, it will make a big >>>>>> difference in >>>>>> >>> the experience of Python developers using Flink. I'm interested in >>>>>> >>> contributing to this work, so I'll reach out to you offline! >>>>>> >>> >>>>>> >>> Also, thanks for sharing some information on the adoption of >>>>>> PyFlink, >>>>>> >>> it's >>>>>> >>> great to see that there are already production users. >>>>>> >>> >>>>>> >>> Marta >>>>>> >>> >>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com> >>>>>> wrote: >>>>>> >>> >>>>>> >>> > Hi Jincheng, >>>>>> >>> > >>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal. >>>>>> >>> > >>>>>> >>> > Big +1 for improving the structure of PyFlink doc. >>>>>> >>> > >>>>>> >>> > It will be very friendly to give PyFlink users a unified >>>>>> entrance to >>>>>> >>> learn >>>>>> >>> > PyFlink documents. >>>>>> >>> > >>>>>> >>> > Best, >>>>>> >>> > Xingbo >>>>>> >>> > >>>>>> >>> > Dian Fu <dian0511...@gmail.com> 于2020年7月31日周五 上午11:00写道: >>>>>> >>> > >>>>>> >>> >> Hi Jincheng, >>>>>> >>> >> >>>>>> >>> >> Thanks a lot for bringing up this discussion and the proposal. >>>>>> +1 to >>>>>> >>> >> improve the Python API doc. >>>>>> >>> >> >>>>>> >>> >> I have received many feedbacks from PyFlink beginners about >>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python >>>>>> doc is >>>>>> >>> mixed >>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants >>>>>> to know. >>>>>> >>> >> >>>>>> >>> >> I think it would greatly improve the user experience if we can >>>>>> have >>>>>> >>> one >>>>>> >>> >> place which includes most knowledges PyFlink users should know. >>>>>> >>> >> >>>>>> >>> >> Regards, >>>>>> >>> >> Dian >>>>>> >>> >> >>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com> >>>>>> 写道: >>>>>> >>> >> >>>>>> >>> >> Hi folks, >>>>>> >>> >> >>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have >>>>>> continued to >>>>>> >>> grow. >>>>>> >>> >> As far as I know there are many companies have used PyFlink >>>>>> for data >>>>>> >>> >> analysis, operation and maintenance monitoring business has >>>>>> been put >>>>>> >>> into >>>>>> >>> >> production(Such as 聚美优品[1](Jumei), 浙江墨芷[2] (Mozhi) etc.). >>>>>> According >>>>>> >>> to >>>>>> >>> >> the feedback we received, current documentation is not very >>>>>> friendly >>>>>> >>> to >>>>>> >>> >> PyFlink users. There are two shortcomings: >>>>>> >>> >> >>>>>> >>> >> - Python related content is mixed in the Java/Scala >>>>>> documentation, >>>>>> >>> which >>>>>> >>> >> makes it difficult for users who only focus on PyFlink to read. >>>>>> >>> >> - There is already a "Python Table API" section in the Table >>>>>> API >>>>>> >>> document >>>>>> >>> >> to store PyFlink documents, but the number of articles is >>>>>> small and >>>>>> >>> the >>>>>> >>> >> content is fragmented. It is difficult for beginners to learn >>>>>> from it. >>>>>> >>> >> >>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. >>>>>> Many >>>>>> >>> >> documents will be added for those new APIs. In order to >>>>>> increase the >>>>>> >>> >> readability and maintainability of the PyFlink document, Wei >>>>>> Zhong >>>>>> >>> and me >>>>>> >>> >> have discussed offline and would like to rework it via this >>>>>> FLIP. >>>>>> >>> >> >>>>>> >>> >> We will rework the document around the following three >>>>>> objectives: >>>>>> >>> >> >>>>>> >>> >> - Add a separate section for Python API under the "Application >>>>>> >>> >> Development" section. >>>>>> >>> >> - Restructure current Python documentation to a brand new >>>>>> structure to >>>>>> >>> >> ensure complete content and friendly to beginners. >>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it >>>>>> more >>>>>> >>> >> friendly to Python users and without affecting Java/Scala >>>>>> users. >>>>>> >>> >> >>>>>> >>> >> More detail can be found in the FLIP-133: >>>>>> >>> >> >>>>>> >>> >>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation >>>>>> >>> >> >>>>>> >>> >> Best, >>>>>> >>> >> Jincheng >>>>>> >>> >> >>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg >>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >>>>>> >> >>>>>> >>>>> >>