Hi all,

The PR has been submitted.[1]

Welcome to help to review it.

[1]: https://github.com/apache/incubator-hudi/pull/1311

vino yang <yanghua1...@gmail.com> 于2020年2月6日周四 上午10:14写道:

> Hi all,
>
> Have filed a Jira issue: https://issues.apache.org/jira/browse/HUDI-602
>
> Best,
> Vino
>
> vino yang <yanghua1...@gmail.com> 于2020年2月4日周二 下午9:39写道:
>
>> Hi Vinoth,
>>
>> Yes, I'd like to give some suggestions about the "MINOR" PR.
>> Will file a Jira issue to track this work.
>>
>> Best,
>> Vino
>>
>> Vinoth Chandar <vin...@apache.org> 于2020年2月4日周二 上午8:16写道:
>>
>>> +1 to vinoyang's suggestions.
>>>
>>> @Vino Yang <vinoy...@apache.org>  , do you want to formalize this and
>>> update our contributing page?
>>>
>>> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hma...@foxmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >
>>> >
>>> > I think these "MINOR" issues are important, a good&nbsp;project
>>> requires
>>> > not only functions, but also good coding style and  habits.
>>> >
>>> >
>>> > Best
>>> > Hmatu
>>> >
>>> >
>>> >
>>> >
>>> > ------------------&nbsp;Original&nbsp;------------------
>>> > From:&nbsp;"Vinoth Chandar"<vin...@apache.org&gt;;
>>> > Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
>>> > To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>>> >
>>> > Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement
>>> >
>>> >
>>> >
>>> > Hi Vino,
>>> >
>>> > You raise a valid point on what "MINOR" PR should be. All JIRAs start
>>> out
>>> > in "NEW" state and committers have to "Accept" the issue already (to
>>> force
>>> > early conversations like this).
>>> >
>>> > May be we should draw some bounds on it like, "cannot be more than 50
>>> > lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be
>>> > the core
>>> > of the issue.
>>> >
>>> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1...@gmail.com&gt;
>>> > wrote:
>>> >
>>> > &gt; Hi Vinoth,
>>> > &gt;
>>> > &gt; Thank you for your thoughts, I agree that focusing on some higher
>>> > priority
>>> > &gt; work is more valuable.
>>> > &gt;
>>> > &gt; This discussion is to sort out and manage the work that the
>>> community
>>> > is
>>> > &gt; already doing. There are currently some PRs working on this type
>>> of
>>> > work,
>>> > &gt; such as PR[1][2][3][4]. The community has not given guidance on
>>> these
>>> > &gt; tasks. I think it's not very appropriate to open a "MINOR" PR
>>> > directly. So,
>>> > &gt; I want to hear from the community and how to manage them more
>>> > effectively.
>>> > &gt; The discussion does not encourage to give a higher priority to
>>> such
>>> > work.
>>> > &gt;
>>> > &gt; We haven't stopped this kind of work, so we should provide
>>> effective
>>> > &gt; guidance and organization so that it doesn't look disorganized.
>>> WYDT?
>>> > &gt;
>>> > &gt; Best,
>>> > &gt; Vino
>>> > &gt;
>>> > &gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
>>> > &gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
>>> > &gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
>>> > &gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
>>> > &gt;
>>> > &gt; Vinoth Chandar <vin...@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
>>> > &gt;
>>> > &gt; &gt; Hi,
>>> > &gt; &gt;
>>> > &gt; &gt; Thanks everyone for sharing your views!
>>> > &gt; &gt;
>>> > &gt; &gt; Some of this conversation is starting to feel like boiling
>>> the
>>> > ocean. I
>>> > &gt; &gt; believe in refactoring with purpose and discussing
>>> > class-by-class or
>>> > &gt; &gt; module-by-module does not make sense to me. Can we first list
>>> > down what
>>> > &gt; we
>>> > &gt; &gt; want to achieve? So far, I have only heard fixing
>>> IDE/IntelliJ
>>> > warnings.
>>> > &gt; &gt; Also instead of focussing on new work, how about looking at
>>> the
>>> > pending
>>> > &gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and
>>> see if
>>> > those
>>> > &gt; are
>>> > &gt; &gt; worth tackling.
>>> > &gt; &gt;
>>> > &gt; &gt; We went down this path for code formatting and today we still
>>> > have
>>> > &gt; &gt; inconsistencies. Looking back, I feel we should have clearly
>>> > defined end
>>> > &gt; &gt; goals for the cleanups and we can then rank them based on
>>> ROI.
>>> > &gt; &gt;
>>> > &gt; &gt; Thanks
>>> > &gt; &gt; Vinoth
>>> > &gt; &gt;
>>> > &gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <
>>> yanghua1...@gmail.com&gt;
>>> > wrote:
>>> > &gt; &gt;
>>> > &gt; &gt; &gt; Hi Shiyan and Bhavani:
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Thanks for sharing your thoughts.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; As I originally stated. The advantage of using modules
>>> as a
>>> > unit to
>>> > &gt; split
>>> > &gt; &gt; &gt; work is that the decomposition is clear, but the
>>> > disadvantage is that
>>> > &gt; the
>>> > &gt; &gt; &gt; volume of changes may be huge, which brings huge risks
>>> > (considering
>>> > &gt; that
>>> > &gt; &gt; &gt; Hudi's test coverage is still not very high) and the
>>> > workload of
>>> > &gt; review.
>>> > &gt; &gt; &gt; The advantage of splitting by class is that the volume
>>> of
>>> > changes is
>>> > &gt; &gt; small
>>> > &gt; &gt; &gt; and the review is more convenient, but the disadvantages
>>> > are too many
>>> > &gt; &gt; tasks
>>> > &gt; &gt; &gt; and high maintenance costs.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; *In addition, we need to define the boundaries of the
>>> "code
>>> > cleanup" I
>>> > &gt; &gt; &gt; expressed in this topic: it is limited to the smart tips
>>> > shown by
>>> > &gt; &gt; Intellij
>>> > &gt; &gt; &gt; IDEA. If the boundaries are too wide, then this
>>> discussion
>>> > will lose
>>> > &gt; &gt; &gt; control.*
>>> > &gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual
>>> > goal. But we
>>> > &gt; are
>>> > &gt; &gt; &gt; not opposed to the community to help improve the
>>> quality of
>>> > the code
>>> > &gt; &gt; &gt; (basically, these tips given by the IDE are more
>>> > reasonable).
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; So, I still give my thoughts: We manage this work with
>>> > Jira. Before we
>>> > &gt; &gt; &gt; start working, we need to find a committer as a mentor.
>>> The
>>> > mentor must
>>> > &gt; &gt; &gt; decide whether the scale of the subtasks is reasonable
>>> and
>>> > whether
>>> > &gt; &gt; &gt; additional unit tests need to be added to verify the
>>> > changes. And the
>>> > &gt; &gt; &gt; mentor should be responsible for merged changes.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; What do you think?
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Bhavani Sudha <bhavanisud...@gmail.com&gt;
>>> 于2020年1月22日周三
>>> > 下午2:22写道:
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to
>>> discussion. I
>>> > feel it would
>>> > &gt; be
>>> > &gt; &gt; &gt; &gt; less disruptive to clean up code as part of
>>> individual
>>> > classes being
>>> > &gt; &gt; &gt; &gt; touched for a specific goal rather than code
>>> cleanup
>>> > being the actual
>>> > &gt; &gt; &gt; goal.
>>> > &gt; &gt; &gt; &gt; This would narrow the touch point and ensure test
>>> > coverage (both unit
>>> > &gt; &gt; and
>>> > &gt; &gt; &gt; &gt; integration tests)&nbsp; catches any
>>> > accidental/unintentional changes.
>>> > &gt; Also
>>> > &gt; &gt; it
>>> > &gt; &gt; &gt; &gt; would give chance to change any documentation
>>> > quoting/referencing
>>> > &gt; that
>>> > &gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; Thanks,
>>> > &gt; &gt; &gt; &gt; Sudha
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
>>> > &gt; &gt; xu.shiyan.raym...@gmail.com&gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by
>>> > modules.
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to
>>> follow,
>>> > my concern is the
>>> > &gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with
>>> some
>>> > on-going changes.
>>> > &gt; If
>>> > &gt; &gt; I
>>> > &gt; &gt; &gt; &gt; may
>>> > &gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should
>>> avoid
>>> > &gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature
>>> > changes/PRs
>>> > &gt; &gt; &gt; &gt; &gt; - modules that are planned to have major
>>> > refactoring due to design
>>> > &gt; &gt; &gt; &gt; changes
>>> > &gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during
>>> > refactoring)
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth
>>> Chandar <
>>> > vin...@apache.org&gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping
>>> > statements being made.
>>> > &gt; But,
>>> > &gt; &gt; &gt; +1
>>> > &gt; &gt; &gt; &gt; &gt; for
>>> > &gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and
>>> having
>>> > some committer
>>> > &gt; “accept”
>>> > &gt; &gt; &gt; the
>>> > &gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to
>>> be
>>> > subjective and we do need
>>> > &gt; to
>>> > &gt; &gt; &gt; &gt; make
>>> > &gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino
>>> yang <
>>> > yanghua1...@gmail.com
>>> > &gt; &gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If
>>> > there is no objection, we
>>> > &gt; &gt; will
>>> > &gt; &gt; &gt; &gt; &gt; &gt; follow
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <
>>> pratyaks...@gmail.com&gt;
>>> > 于2020年1月21日周二
>>> > &gt; 下午4:56写道:
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I
>>> have
>>> > done this code cleanup for
>>> > &gt; &gt; &gt; test
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel
>>> > there is a need to do the same
>>> > &gt; at
>>> > &gt; &gt; &gt; &gt; other
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would
>>> definitely
>>> > like to volunteer for
>>> > &gt; this.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM
>>> > vino yang <
>>> > &gt; &gt; yanghua1...@gmail.com
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code
>>> quality
>>> > of some Hudi module is not very
>>> > &gt; &gt; &gt; well.
>>> > &gt; &gt; &gt; &gt; &gt; As
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; many
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the
>>> > Intellij IDEA has shown many
>>> > &gt; &gt; &gt; &gt; intellisense
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; about
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement.
>>> The
>>> > community does not object to
>>> > &gt; &gt; doing
>>> > &gt; &gt; &gt; &gt; the
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and
>>> the
>>> > work has been started via some
>>> > &gt; &gt; &gt; &gt; direct
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers.
>>> The
>>> > current way is unorganized and
>>> > &gt; &gt; hard
>>> > &gt; &gt; &gt; &gt; to
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I
>>> > prefer to manage this work with
>>> > &gt; the
>>> > &gt; &gt; &gt; &gt; Jira
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella
>>> > issue. Then, split the work into
>>> > &gt; &gt; &gt; &gt; several
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell"
>>> lays
>>> > anywhere in the whole project.
>>> > &gt; &gt; &gt; It's
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to
>>> split
>>> > the subtasks. For example, some
>>> > &gt; &gt; &gt; files
>>> > &gt; &gt; &gt; &gt; &gt; &gt; have
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; a
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules
>>> have
>>> > few. So I suggest the standard
>>> > &gt; &gt; &gt; would
>>> > &gt; &gt; &gt; &gt; &gt; &gt; depend
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes.
>>> > Before working, any subtask
>>> > &gt; should
>>> > &gt; &gt; &gt; &gt; find
>>> > &gt; &gt; &gt; &gt; &gt; a
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who
>>> > would judge and approve the scope
>>> > &gt; &gt; is
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and
>>> suggestions
>>> > would be appreciated.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt;
>>> > &gt;
>>>
>>

Reply via email to