I think this sounds good. +1

On Wed, Aug 5, 2020 at 8:37 PM jincheng sun <sunjincheng...@gmail.com>
wrote:

> Hi David, Thank you for sharing the problems with the current document,
> and I agree with you as I also got the same feedback from Chinese users. I
> am often contacted by users to ask questions such as whether PyFlink
> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
> cause of these problems is that our existing documents are based on Java
> users (text and API mixed part). Since Python is newly added from 1.9, many
> document information is not friendly to Python users. They don't want to
> look for Python content in unfamiliar Java documents. Just yesterday, there
> were complaints from Chinese users about where is all the document entries
> of  Python API. So, have a centralized entry and clear document structure,
> which is the urgent demand of Python users. The original intention of FLIP
> is do our best to solve these user pain points.
>
> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
> optimization. You're right. PySpark already has a lot of Python user
> groups. They also find that Python user community is an important position
> for multilingual support. The centralization and unification of Python
> document content will reduce the learning cost of Python users, and good
> document structure and content will also reduce the Q & A burden of the
> community, It's a once and for all job.
>
> Hi Seth, I wonder if your concerns have been resolved through the previous
> discussion?
>
> Anyway, the principle of FLIP is that in python document should only
> include Python specific content, instead of making a copy of the Java
> content. And would be great to have you to join in the improvement for
> PyFlink (Both PRs and Review PRs).
>
> Best,
> Jincheng
>
>
> Wei Zhong <weizhong0...@gmail.com> 于2020年8月5日周三 下午5:46写道:
>
>> Hi Xingbo,
>>
>> Thanks for your information.
>>
>> I think the PySpark's documentation redesigning deserves our attention.
>> It seems that the Spark community has also begun to treat the user
>> experience of Python documentation more seriously. We can continue to pay
>> attention to the discussion and progress of the redesigning in the Spark
>> community. It is so similar to our working that there should be some ideas
>> worthy for us.
>>
>> Best,
>> Wei
>>
>>
>> 在 2020年8月5日,15:02,Xingbo Huang <hxbks...@gmail.com> 写道:
>>
>> Hi,
>>
>> I found that the spark community is also working on redesigning pyspark
>> documentation[1] recently. Maybe we can compare the difference between our
>> document structure and its document structure.
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>
>> Best,
>> Xingbo
>>
>> David Anderson <da...@alpinegizmo.com> 于2020年8月5日周三 上午3:17写道:
>>
>>> I'm delighted to see energy going into improving the documentation.
>>>
>>> With the current documentation, I get a lot of questions that I believe
>>> reflect two fundamental problems with what we currently provide:
>>>
>>> (1) We have a lot of contextual information in our heads about how Flink
>>> works, and we are able to use that knowledge to make reasonable inferences
>>> about how things (probably) work in cases we aren't so familiar with. For
>>> example, I get a lot of questions of the form "If I use <this feature> will
>>> I still have exactly once guarantees?" The answer is always yes, but they
>>> continue to have doubts because we have failed to clearly communicate this
>>> fundamental, underlying principle.
>>>
>>> This specific example about fault tolerance applies across all of the
>>> Flink docs, but the general idea can also be applied to the Table/SQL and
>>> PyFlink docs. The guiding principles underlying these APIs should be
>>> written down in one easy-to-find place.
>>>
>>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
>>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>>> very difficult to answer because it is frequently the case that one has to
>>> reason about why a given feature doesn't seem to appear in the
>>> documentation. It could be that I'm looking in the wrong place, or it could
>>> be that someone forgot to document something, or it could be that it can in
>>> fact be done by applying a general mechanism in a specific way that I
>>> haven't thought of -- as in this case, where one can use a JDBC sink from
>>> Python if one thinks to use DDL.
>>>
>>> So I think it would be helpful to be explicit about both what is, and
>>> what is not, supported in PyFlink. And to have some very clear organizing
>>> principles in the documentation so that users can quickly learn where to
>>> look for specific facts.
>>>
>>> Regards,
>>> David
>>>
>>>
>>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <sunjincheng...@gmail.com>
>>> wrote:
>>>
>>>> Hi Seth and David,
>>>>
>>>> I'm very happy to have your reply and suggestions. I would like to
>>>> share my thoughts here:
>>>>
>>>> The main motivation we want to refactor the PyFlink doc is that we want
>>>> to make sure that the Python users could find all they want starting from
>>>> the PyFlink documentation mainpage. That’s, the PyFlink documentation
>>>> should have a catalogue which includes all the functionalities available in
>>>> PyFlink. However, this doesn’t mean that we will make a copy of the content
>>>> of the documentation in the other places. It may be just a reference/link
>>>> to the other documentation if needed. For the documentation added under
>>>> PyFlink mainpage, the principle is that it should only include Python
>>>> specific content, instead of making a copy of the Java content.
>>>>
>>>> >>  I'm concerned that this proposal duplicates a lot of content that
>>>> will quickly get out of sync. It feels like it is documenting PyFlink
>>>> separately from the rest of the project.
>>>>
>>>> Regarding the concerns about maintainability, as mentioned above, The
>>>> goal of this FLIP is to provide an intelligible entrance of Python API, and
>>>> the content in it should only contain the information which is useful for
>>>> Python users. There are indeed many agenda items that duplicate the Java
>>>> documents in this FLIP, but it doesn't mean the content would be copied
>>>> from Java documentation. i.e, if the content of the document is the same as
>>>> the corresponding Java document, we will add a link to the Java document.
>>>> e.g. the "Built-in functions" and "SQL". We only create a page for the
>>>> Python-only content, and then redirect to the Java document if there is
>>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the
>>>> document is Python-only and already exists, we will move it from the old
>>>> python document to the new python document, e.g. "Configurations". If the
>>>> document is Python-only and not exists before, we will create a new page
>>>> for it. e.g. "DataTypes".
>>>>
>>>> The main reason we create a new page for Python Data Types is that it
>>>> is only conceptually one-to-one correspondence with Java Data Types, but
>>>> the actual document content would be very different from Java DataTypes.
>>>> Some detailed difference are as following:
>>>>
>>>>
>>>>   - The text in the Java Data Types document is written for JVM-based
>>>> language users, which is incomprehensible to users who only understand
>>>> python.
>>>>   - Currently the Python Data Types does not support the "bridgedTo"
>>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>>>>   - The section "Planner Compatibility" and "Data Type Extraction" are
>>>> only useful for Java/Scala users.
>>>>   - We want to add sections which may only apply for Python such as
>>>> which Data Types are currently supported in Python, the mapping between
>>>> DataType and Python object type, etc.
>>>>
>>>> I think the root cause of such a difference with existing documents is
>>>> that, Python is the first non-JVM language we support in flink. This means
>>>> our previous method of sharing documents between Java and Scala may not be
>>>> suitable for Python. So we will adopt some very different methods to
>>>> provide documentation for Python users. Of course, we should reduce
>>>> maintenance costs as much as possible while ensuring user experience.
>>>> Furthermore, python is the first step of flink multi-language support, and
>>>> there may be R, Go, etc in future. it is very necessary for us to form main
>>>> page for each language, so that users of each type of language can focus on
>>>> the content which they care about.
>>>>
>>>> >> Things like the cookbook and tutorial should be under the Try Flink
>>>> section of the documentation.
>>>>
>>>> Regarding the position of the "Cookbook" section, in my sense the "Try
>>>> Flink" is for the new users and the "Cookbook" is for more advanced users,
>>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
>>>> World” and In “Cookbook” we can add more use cases closer to production
>>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
>>>> keep the current structure.
>>>>
>>>> >>  it's relatively straightforward to compare the Python API with the
>>>> Java and Scala versions.
>>>>
>>>> Regarding the comparison between Python API and Java/Scala API, I think
>>>> the majority of users, especially the beginner users, would not have this
>>>> demand. The priority of increasing user experience for beginner users seems
>>>> higher than it from my side. Would you please add more inputs for why user
>>>> want to compare? How much impact will the comparison be if we put it on
>>>> multiple pages :)
>>>>
>>>> Thanks for all of your feedback and suggestions, any follow-up feedback
>>>> is welcome.
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>>
>>>> David Anderson <da...@alpinegizmo.com> 于2020年8月3日周一 下午10:49写道:
>>>>
>>>>> Jincheng,
>>>>>
>>>>> One thing that I like about the way that the documentation is
>>>>> currently organized is that it's relatively straightforward to compare the
>>>>> Python API with the Java and Scala versions. I'm concerned that if the
>>>>> PyFlink docs are more independent, it will be challenging to respond to
>>>>> questions about which features from the other APIs are available from
>>>>> Python.
>>>>>
>>>>> David
>>>>>
>>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Would be great if you could join the contribution of PyFlink
>>>>>> documentation @Marta !
>>>>>> Thanks for all of the positive feedback. I will start a formal vote
>>>>>> then
>>>>>> later...
>>>>>>
>>>>>> Best,
>>>>>> Jincheng
>>>>>>
>>>>>>
>>>>>> Shuiqiang Chen <acqua....@gmail.com> 于2020年8月3日周一 上午9:56写道:
>>>>>>
>>>>>> > Hi jincheng,
>>>>>> >
>>>>>> > Thanks for the discussion. +1 for the FLIP.
>>>>>> >
>>>>>> > A well-organized documentation will greatly improve the efficiency
>>>>>> and
>>>>>> > experience for developers.
>>>>>> >
>>>>>> > Best,
>>>>>> > Shuiqiang
>>>>>> >
>>>>>> > Hequn Cheng <he...@apache.org> 于2020年8月1日周六 上午8:42写道:
>>>>>> >
>>>>>> >> Hi Jincheng,
>>>>>> >>
>>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>>>>> >>
>>>>>> >> I think this will bring big benefits for the PyFlink users.
>>>>>> Currently,
>>>>>> >> the Python TableAPI document is hidden deeply under the
>>>>>> TableAPI&SQL tab
>>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation
>>>>>> is mixed
>>>>>> >> with Java/Scala documentation. It is hard for users to have an
>>>>>> overview of
>>>>>> >> all the PyFlink documents. As more and more functionalities are
>>>>>> added into
>>>>>> >> PyFlink, I think it's time for us to refactor the document.
>>>>>> >>
>>>>>> >> Best,
>>>>>> >> Hequn
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>>>>> ma...@ververica.com>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >>> Hi, Jincheng!
>>>>>> >>>
>>>>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>>>>> difference in
>>>>>> >>> the experience of Python developers using Flink. I'm interested in
>>>>>> >>> contributing to this work, so I'll reach out to you offline!
>>>>>> >>>
>>>>>> >>> Also, thanks for sharing some information on the adoption of
>>>>>> PyFlink,
>>>>>> >>> it's
>>>>>> >>> great to see that there are already production users.
>>>>>> >>>
>>>>>> >>> Marta
>>>>>> >>>
>>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> > Hi Jincheng,
>>>>>> >>> >
>>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>>>>> >>> >
>>>>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>>>>> >>> >
>>>>>> >>> > It will be very friendly to give PyFlink users a unified
>>>>>> entrance to
>>>>>> >>> learn
>>>>>> >>> > PyFlink documents.
>>>>>> >>> >
>>>>>> >>> > Best,
>>>>>> >>> > Xingbo
>>>>>> >>> >
>>>>>> >>> > Dian Fu <dian0511...@gmail.com> 于2020年7月31日周五 上午11:00写道:
>>>>>> >>> >
>>>>>> >>> >> Hi Jincheng,
>>>>>> >>> >>
>>>>>> >>> >> Thanks a lot for bringing up this discussion and the proposal.
>>>>>> +1 to
>>>>>> >>> >> improve the Python API doc.
>>>>>> >>> >>
>>>>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python
>>>>>> doc is
>>>>>> >>> mixed
>>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants
>>>>>> to know.
>>>>>> >>> >>
>>>>>> >>> >> I think it would greatly improve the user experience if we can
>>>>>> have
>>>>>> >>> one
>>>>>> >>> >> place which includes most knowledges PyFlink users should know.
>>>>>> >>> >>
>>>>>> >>> >> Regards,
>>>>>> >>> >> Dian
>>>>>> >>> >>
>>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com>
>>>>>> 写道:
>>>>>> >>> >>
>>>>>> >>> >> Hi folks,
>>>>>> >>> >>
>>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have
>>>>>> continued to
>>>>>> >>> grow.
>>>>>> >>> >> As far as I know there are many companies have used PyFlink
>>>>>> for data
>>>>>> >>> >> analysis, operation and maintenance monitoring business has
>>>>>> been put
>>>>>> >>> into
>>>>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>>>>> According
>>>>>> >>> to
>>>>>> >>> >> the feedback we received, current documentation is not very
>>>>>> friendly
>>>>>> >>> to
>>>>>> >>> >> PyFlink users. There are two shortcomings:
>>>>>> >>> >>
>>>>>> >>> >> - Python related content is mixed in the Java/Scala
>>>>>> documentation,
>>>>>> >>> which
>>>>>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>>>>>> >>> >> - There is already a "Python Table API" section in the Table
>>>>>> API
>>>>>> >>> document
>>>>>> >>> >> to store PyFlink documents, but the number of articles is
>>>>>> small and
>>>>>> >>> the
>>>>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>>>>> from it.
>>>>>> >>> >>
>>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API.
>>>>>> Many
>>>>>> >>> >> documents will be added for those new APIs. In order to
>>>>>> increase the
>>>>>> >>> >> readability and maintainability of the PyFlink document, Wei
>>>>>> Zhong
>>>>>> >>> and me
>>>>>> >>> >> have discussed offline and would like to rework it via this
>>>>>> FLIP.
>>>>>> >>> >>
>>>>>> >>> >> We will rework the document around the following three
>>>>>> objectives:
>>>>>> >>> >>
>>>>>> >>> >> - Add a separate section for Python API under the "Application
>>>>>> >>> >> Development" section.
>>>>>> >>> >> - Restructure current Python documentation to a brand new
>>>>>> structure to
>>>>>> >>> >> ensure complete content and friendly to beginners.
>>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it
>>>>>> more
>>>>>> >>> >> friendly to Python users and without affecting Java/Scala
>>>>>> users.
>>>>>> >>> >>
>>>>>> >>> >> More detail can be found in the FLIP-133:
>>>>>> >>> >>
>>>>>> >>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>>>>> >>> >>
>>>>>> >>> >> Best,
>>>>>> >>> >> Jincheng
>>>>>> >>> >>
>>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>>>>> >>> >>
>>>>>> >>> >>
>>>>>> >>> >>
>>>>>> >>>
>>>>>> >>
>>>>>>
>>>>>
>>

Reply via email to