Hi all, The PR has been submitted.[1]
Welcome to help to review it. [1]: https://github.com/apache/incubator-hudi/pull/1311 vino yang <yanghua1...@gmail.com> 于2020年2月6日周四 上午10:14写道: > Hi all, > > Have filed a Jira issue: https://issues.apache.org/jira/browse/HUDI-602 > > Best, > Vino > > vino yang <yanghua1...@gmail.com> 于2020年2月4日周二 下午9:39写道: > >> Hi Vinoth, >> >> Yes, I'd like to give some suggestions about the "MINOR" PR. >> Will file a Jira issue to track this work. >> >> Best, >> Vino >> >> Vinoth Chandar <vin...@apache.org> 于2020年2月4日周二 上午8:16写道: >> >>> +1 to vinoyang's suggestions. >>> >>> @Vino Yang <vinoy...@apache.org> , do you want to formalize this and >>> update our contributing page? >>> >>> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hma...@foxmail.com> wrote: >>> >>> > Hi, >>> > >>> > >>> > >>> > I think these "MINOR" issues are important, a good project >>> requires >>> > not only functions, but also good coding style and habits. >>> > >>> > >>> > Best >>> > Hmatu >>> > >>> > >>> > >>> > >>> > ------------------ Original ------------------ >>> > From: "Vinoth Chandar"<vin...@apache.org>; >>> > Date: Mon, Jan 27, 2020 03:25 AM >>> > To: "dev"<dev@hudi.apache.org>; >>> > >>> > Subject: Re: [DISCUSS] Unify Hudi code cleanup and improvement >>> > >>> > >>> > >>> > Hi Vino, >>> > >>> > You raise a valid point on what "MINOR" PR should be. All JIRAs start >>> out >>> > in "NEW" state and committers have to "Accept" the issue already (to >>> force >>> > early conversations like this). >>> > >>> > May be we should draw some bounds on it like, "cannot be more than 50 >>> > lines", "No functionality changes" .. etc? WDYT? This seems to be >>> > the core >>> > of the issue. >>> > >>> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1...@gmail.com> >>> > wrote: >>> > >>> > > Hi Vinoth, >>> > > >>> > > Thank you for your thoughts, I agree that focusing on some higher >>> > priority >>> > > work is more valuable. >>> > > >>> > > This discussion is to sort out and manage the work that the >>> community >>> > is >>> > > already doing. There are currently some PRs working on this type >>> of >>> > work, >>> > > such as PR[1][2][3][4]. The community has not given guidance on >>> these >>> > > tasks. I think it's not very appropriate to open a "MINOR" PR >>> > directly. So, >>> > > I want to hear from the community and how to manage them more >>> > effectively. >>> > > The discussion does not encourage to give a higher priority to >>> such >>> > work. >>> > > >>> > > We haven't stopped this kind of work, so we should provide >>> effective >>> > > guidance and organization so that it doesn't look disorganized. >>> WYDT? >>> > > >>> > > Best, >>> > > Vino >>> > > >>> > > [1]: https://github.com/apache/incubator-hudi/pull/1237 >>> > > [2]: https://github.com/apache/incubator-hudi/pull/1139 >>> > > [3]: https://github.com/apache/incubator-hudi/pull/1137 >>> > > [4]: https://github.com/apache/incubator-hudi/pull/1136 >>> > > >>> > > Vinoth Chandar <vin...@apache.org> 于2020年1月23日周四 下午1:20写道: >>> > > >>> > > > Hi, >>> > > > >>> > > > Thanks everyone for sharing your views! >>> > > > >>> > > > Some of this conversation is starting to feel like boiling >>> the >>> > ocean. I >>> > > > believe in refactoring with purpose and discussing >>> > class-by-class or >>> > > > module-by-module does not make sense to me. Can we first list >>> > down what >>> > > we >>> > > > want to achieve? So far, I have only heard fixing >>> IDE/IntelliJ >>> > warnings. >>> > > > Also instead of focussing on new work, how about looking at >>> the >>> > pending >>> > > > JIRAs under "Testing" "Code Cleanup" components first and >>> see if >>> > those >>> > > are >>> > > > worth tackling. >>> > > > >>> > > > We went down this path for code formatting and today we still >>> > have >>> > > > inconsistencies. Looking back, I feel we should have clearly >>> > defined end >>> > > > goals for the cleanups and we can then rank them based on >>> ROI. >>> > > > >>> > > > Thanks >>> > > > Vinoth >>> > > > >>> > > > On Wed, Jan 22, 2020 at 7:05 PM vino yang < >>> yanghua1...@gmail.com> >>> > wrote: >>> > > > >>> > > > > Hi Shiyan and Bhavani: >>> > > > > >>> > > > > Thanks for sharing your thoughts. >>> > > > > >>> > > > > As I originally stated. The advantage of using modules >>> as a >>> > unit to >>> > > split >>> > > > > work is that the decomposition is clear, but the >>> > disadvantage is that >>> > > the >>> > > > > volume of changes may be huge, which brings huge risks >>> > (considering >>> > > that >>> > > > > Hudi's test coverage is still not very high) and the >>> > workload of >>> > > review. >>> > > > > The advantage of splitting by class is that the volume >>> of >>> > changes is >>> > > > small >>> > > > > and the review is more convenient, but the disadvantages >>> > are too many >>> > > > tasks >>> > > > > and high maintenance costs. >>> > > > > >>> > > > > >>> > > > > *In addition, we need to define the boundaries of the >>> "code >>> > cleanup" I >>> > > > > expressed in this topic: it is limited to the smart tips >>> > shown by >>> > > > Intellij >>> > > > > IDEA. If the boundaries are too wide, then this >>> discussion >>> > will lose >>> > > > > control.* >>> > > > > I agree with Bhavani that we don't take it as the actual >>> > goal. But we >>> > > are >>> > > > > not opposed to the community to help improve the >>> quality of >>> > the code >>> > > > > (basically, these tips given by the IDE are more >>> > reasonable). >>> > > > > >>> > > > > >>> > > > > So, I still give my thoughts: We manage this work with >>> > Jira. Before we >>> > > > > start working, we need to find a committer as a mentor. >>> The >>> > mentor must >>> > > > > decide whether the scale of the subtasks is reasonable >>> and >>> > whether >>> > > > > additional unit tests need to be added to verify the >>> > changes. And the >>> > > > > mentor should be responsible for merged changes. >>> > > > > >>> > > > > What do you think? >>> > > > > >>> > > > > Best, >>> > > > > Vino >>> > > > > >>> > > > > Bhavani Sudha <bhavanisud...@gmail.com> >>> 于2020年1月22日周三 >>> > 下午2:22写道: >>> > > > > >>> > > > > > Hi @vinoyang thanks for bringing this to >>> discussion. I >>> > feel it would >>> > > be >>> > > > > > less disruptive to clean up code as part of >>> individual >>> > classes being >>> > > > > > touched for a specific goal rather than code >>> cleanup >>> > being the actual >>> > > > > goal. >>> > > > > > This would narrow the touch point and ensure test >>> > coverage (both unit >>> > > > and >>> > > > > > integration tests) catches any >>> > accidental/unintentional changes. >>> > > Also >>> > > > it >>> > > > > > would give chance to change any documentation >>> > quoting/referencing >>> > > that >>> > > > > > code. Wanted to share my personal opinion. >>> > > > > > >>> > > > > > Thanks, >>> > > > > > Sudha >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu < >>> > > > xu.shiyan.raym...@gmail.com> >>> > > > > > wrote: >>> > > > > > >>> > > > > > > The clean-up work can actually be split by >>> > modules. >>> > > > > > > >>> > > > > > > Though it is generally a good practice to >>> follow, >>> > my concern is the >>> > > > > > > clean-up is likely to cause conflicts with >>> some >>> > on-going changes. >>> > > If >>> > > > I >>> > > > > > may >>> > > > > > > suggest, the dedicated clean-up tasks should >>> avoid >>> > > > > > > - modules that are undergoing multiple feature >>> > changes/PRs >>> > > > > > > - modules that are planned to have major >>> > refactoring due to design >>> > > > > > changes >>> > > > > > > (since clean-up can be done altogether during >>> > refactoring) >>> > > > > > > >>> > > > > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth >>> Chandar < >>> > vin...@apache.org> >>> > > > > > wrote: >>> > > > > > > >>> > > > > > > > Not sure if I fully agree with sweeping >>> > statements being made. >>> > > But, >>> > > > > +1 >>> > > > > > > for >>> > > > > > > > structuring this work via Jiras and >>> having >>> > some committer >>> > > “accept” >>> > > > > the >>> > > > > > > > issue first. Some of these tend to >>> be >>> > subjective and we do need >>> > > to >>> > > > > > make >>> > > > > > > > different tradeoffs. >>> > > > > > > > >>> > > > > > > > On Tue, Jan 21, 2020 at 1:28 AM vino >>> yang < >>> > yanghua1...@gmail.com >>> > > > >>> > > > > > wrote: >>> > > > > > > > >>> > > > > > > > > Hi Pratyaksh, >>> > > > > > > > > >>> > > > > > > > > Thanks for your thought. >>> > > > > > > > > >>> > > > > > > > > Let's listen to others' comments. If >>> > there is no objection, we >>> > > > will >>> > > > > > > > follow >>> > > > > > > > > this way. >>> > > > > > > > > >>> > > > > > > > > Best, >>> > > > > > > > > Vino >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > Pratyaksh Sharma < >>> pratyaks...@gmail.com> >>> > 于2020年1月21日周二 >>> > > 下午4:56写道: >>> > > > > > > > > >>> > > > > > > > > > Hi Vino, >>> > > > > > > > > > >>> > > > > > > > > > Big +1 for this initiative. I >>> have >>> > done this code cleanup for >>> > > > > test >>> > > > > > > > > classes >>> > > > > > > > > > in the past and strongly feel >>> > there is a need to do the same >>> > > at >>> > > > > > other >>> > > > > > > > > > places as well. I would >>> definitely >>> > like to volunteer for >>> > > this. >>> > > > > > > > > > >>> > > > > > > > > > On Tue, Jan 21, 2020 at 1:52 PM >>> > vino yang < >>> > > > yanghua1...@gmail.com >>> > > > > > >>> > > > > > > > wrote: >>> > > > > > > > > > >>> > > > > > > > > > > Hi folks, >>> > > > > > > > > > > >>> > > > > > > > > > > Currently, the code >>> quality >>> > of some Hudi module is not very >>> > > > > well. >>> > > > > > > As >>> > > > > > > > > many >>> > > > > > > > > > > developers have seen, the >>> > Intellij IDEA has shown many >>> > > > > > intellisense >>> > > > > > > > > about >>> > > > > > > > > > > cleanup and improvement. >>> The >>> > community does not object to >>> > > > doing >>> > > > > > the >>> > > > > > > > > > cleanup >>> > > > > > > > > > > and improvement work and >>> the >>> > work has been started via some >>> > > > > > direct >>> > > > > > > > > > "minor" >>> > > > > > > > > > > PRs by some volunteers. >>> The >>> > current way is unorganized and >>> > > > hard >>> > > > > > to >>> > > > > > > > > > manage. >>> > > > > > > > > > > For tracking this work, I >>> > prefer to manage this work with >>> > > the >>> > > > > > Jira >>> > > > > > > > > issue. >>> > > > > > > > > > > We can create an umbrella >>> > issue. Then, split the work into >>> > > > > > several >>> > > > > > > > > > > subtasks. >>> > > > > > > > > > > >>> > > > > > > > > > > Since those "bad smell" >>> lays >>> > anywhere in the whole project. >>> > > > > It's >>> > > > > > > > > > difficult >>> > > > > > > > > > > to give a standard to >>> split >>> > the subtasks. For example, some >>> > > > > files >>> > > > > > > > have >>> > > > > > > > > a >>> > > > > > > > > > > lot while some modules >>> have >>> > few. So I suggest the standard >>> > > > > would >>> > > > > > > > depend >>> > > > > > > > > > on >>> > > > > > > > > > > the volume of the changes. >>> > Before working, any subtask >>> > > should >>> > > > > > find >>> > > > > > > a >>> > > > > > > > > > > committer as a mentor who >>> > would judge and approve the scope >>> > > > is >>> > > > > > > > > suitable. >>> > > > > > > > > > > >>> > > > > > > > > > > What do you think? >>> > > > > > > > > > > >>> > > > > > > > > > > Any comments and >>> suggestions >>> > would be appreciated. >>> > > > > > > > > > > >>> > > > > > > > > > > Best, >>> > > > > > > > > > > Vino >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> >>