Hi all, Have filed a Jira issue: https://issues.apache.org/jira/browse/HUDI-602
Best, Vino vino yang <yanghua1...@gmail.com> 于2020年2月4日周二 下午9:39写道: > Hi Vinoth, > > Yes, I'd like to give some suggestions about the "MINOR" PR. > Will file a Jira issue to track this work. > > Best, > Vino > > Vinoth Chandar <vin...@apache.org> 于2020年2月4日周二 上午8:16写道: > >> +1 to vinoyang's suggestions. >> >> @Vino Yang <vinoy...@apache.org> , do you want to formalize this and >> update our contributing page? >> >> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hma...@foxmail.com> wrote: >> >> > Hi, >> > >> > >> > >> > I think these "MINOR" issues are important, a good project requires >> > not only functions, but also good coding style and habits. >> > >> > >> > Best >> > Hmatu >> > >> > >> > >> > >> > ------------------ Original ------------------ >> > From: "Vinoth Chandar"<vin...@apache.org>; >> > Date: Mon, Jan 27, 2020 03:25 AM >> > To: "dev"<dev@hudi.apache.org>; >> > >> > Subject: Re: [DISCUSS] Unify Hudi code cleanup and improvement >> > >> > >> > >> > Hi Vino, >> > >> > You raise a valid point on what "MINOR" PR should be. All JIRAs start >> out >> > in "NEW" state and committers have to "Accept" the issue already (to >> force >> > early conversations like this). >> > >> > May be we should draw some bounds on it like, "cannot be more than 50 >> > lines", "No functionality changes" .. etc? WDYT? This seems to be >> > the core >> > of the issue. >> > >> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1...@gmail.com> >> > wrote: >> > >> > > Hi Vinoth, >> > > >> > > Thank you for your thoughts, I agree that focusing on some higher >> > priority >> > > work is more valuable. >> > > >> > > This discussion is to sort out and manage the work that the >> community >> > is >> > > already doing. There are currently some PRs working on this type of >> > work, >> > > such as PR[1][2][3][4]. The community has not given guidance on >> these >> > > tasks. I think it's not very appropriate to open a "MINOR" PR >> > directly. So, >> > > I want to hear from the community and how to manage them more >> > effectively. >> > > The discussion does not encourage to give a higher priority to such >> > work. >> > > >> > > We haven't stopped this kind of work, so we should provide >> effective >> > > guidance and organization so that it doesn't look disorganized. >> WYDT? >> > > >> > > Best, >> > > Vino >> > > >> > > [1]: https://github.com/apache/incubator-hudi/pull/1237 >> > > [2]: https://github.com/apache/incubator-hudi/pull/1139 >> > > [3]: https://github.com/apache/incubator-hudi/pull/1137 >> > > [4]: https://github.com/apache/incubator-hudi/pull/1136 >> > > >> > > Vinoth Chandar <vin...@apache.org> 于2020年1月23日周四 下午1:20写道: >> > > >> > > > Hi, >> > > > >> > > > Thanks everyone for sharing your views! >> > > > >> > > > Some of this conversation is starting to feel like boiling the >> > ocean. I >> > > > believe in refactoring with purpose and discussing >> > class-by-class or >> > > > module-by-module does not make sense to me. Can we first list >> > down what >> > > we >> > > > want to achieve? So far, I have only heard fixing IDE/IntelliJ >> > warnings. >> > > > Also instead of focussing on new work, how about looking at >> the >> > pending >> > > > JIRAs under "Testing" "Code Cleanup" components first and see >> if >> > those >> > > are >> > > > worth tackling. >> > > > >> > > > We went down this path for code formatting and today we still >> > have >> > > > inconsistencies. Looking back, I feel we should have clearly >> > defined end >> > > > goals for the cleanups and we can then rank them based on ROI. >> > > > >> > > > Thanks >> > > > Vinoth >> > > > >> > > > On Wed, Jan 22, 2020 at 7:05 PM vino yang < >> yanghua1...@gmail.com> >> > wrote: >> > > > >> > > > > Hi Shiyan and Bhavani: >> > > > > >> > > > > Thanks for sharing your thoughts. >> > > > > >> > > > > As I originally stated. The advantage of using modules >> as a >> > unit to >> > > split >> > > > > work is that the decomposition is clear, but the >> > disadvantage is that >> > > the >> > > > > volume of changes may be huge, which brings huge risks >> > (considering >> > > that >> > > > > Hudi's test coverage is still not very high) and the >> > workload of >> > > review. >> > > > > The advantage of splitting by class is that the volume of >> > changes is >> > > > small >> > > > > and the review is more convenient, but the disadvantages >> > are too many >> > > > tasks >> > > > > and high maintenance costs. >> > > > > >> > > > > >> > > > > *In addition, we need to define the boundaries of the >> "code >> > cleanup" I >> > > > > expressed in this topic: it is limited to the smart tips >> > shown by >> > > > Intellij >> > > > > IDEA. If the boundaries are too wide, then this >> discussion >> > will lose >> > > > > control.* >> > > > > I agree with Bhavani that we don't take it as the actual >> > goal. But we >> > > are >> > > > > not opposed to the community to help improve the quality >> of >> > the code >> > > > > (basically, these tips given by the IDE are more >> > reasonable). >> > > > > >> > > > > >> > > > > So, I still give my thoughts: We manage this work with >> > Jira. Before we >> > > > > start working, we need to find a committer as a mentor. >> The >> > mentor must >> > > > > decide whether the scale of the subtasks is reasonable >> and >> > whether >> > > > > additional unit tests need to be added to verify the >> > changes. And the >> > > > > mentor should be responsible for merged changes. >> > > > > >> > > > > What do you think? >> > > > > >> > > > > Best, >> > > > > Vino >> > > > > >> > > > > Bhavani Sudha <bhavanisud...@gmail.com> 于2020年1月22日周三 >> > 下午2:22写道: >> > > > > >> > > > > > Hi @vinoyang thanks for bringing this to >> discussion. I >> > feel it would >> > > be >> > > > > > less disruptive to clean up code as part of >> individual >> > classes being >> > > > > > touched for a specific goal rather than code cleanup >> > being the actual >> > > > > goal. >> > > > > > This would narrow the touch point and ensure test >> > coverage (both unit >> > > > and >> > > > > > integration tests) catches any >> > accidental/unintentional changes. >> > > Also >> > > > it >> > > > > > would give chance to change any documentation >> > quoting/referencing >> > > that >> > > > > > code. Wanted to share my personal opinion. >> > > > > > >> > > > > > Thanks, >> > > > > > Sudha >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu < >> > > > xu.shiyan.raym...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > > The clean-up work can actually be split by >> > modules. >> > > > > > > >> > > > > > > Though it is generally a good practice to >> follow, >> > my concern is the >> > > > > > > clean-up is likely to cause conflicts with some >> > on-going changes. >> > > If >> > > > I >> > > > > > may >> > > > > > > suggest, the dedicated clean-up tasks should >> avoid >> > > > > > > - modules that are undergoing multiple feature >> > changes/PRs >> > > > > > > - modules that are planned to have major >> > refactoring due to design >> > > > > > changes >> > > > > > > (since clean-up can be done altogether during >> > refactoring) >> > > > > > > >> > > > > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar >> < >> > vin...@apache.org> >> > > > > > wrote: >> > > > > > > >> > > > > > > > Not sure if I fully agree with sweeping >> > statements being made. >> > > But, >> > > > > +1 >> > > > > > > for >> > > > > > > > structuring this work via Jiras and having >> > some committer >> > > “accept” >> > > > > the >> > > > > > > > issue first. Some of these tend to >> be >> > subjective and we do need >> > > to >> > > > > > make >> > > > > > > > different tradeoffs. >> > > > > > > > >> > > > > > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang >> < >> > yanghua1...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi Pratyaksh, >> > > > > > > > > >> > > > > > > > > Thanks for your thought. >> > > > > > > > > >> > > > > > > > > Let's listen to others' comments. If >> > there is no objection, we >> > > > will >> > > > > > > > follow >> > > > > > > > > this way. >> > > > > > > > > >> > > > > > > > > Best, >> > > > > > > > > Vino >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Pratyaksh Sharma < >> pratyaks...@gmail.com> >> > 于2020年1月21日周二 >> > > 下午4:56写道: >> > > > > > > > > >> > > > > > > > > > Hi Vino, >> > > > > > > > > > >> > > > > > > > > > Big +1 for this initiative. I >> have >> > done this code cleanup for >> > > > > test >> > > > > > > > > classes >> > > > > > > > > > in the past and strongly feel >> > there is a need to do the same >> > > at >> > > > > > other >> > > > > > > > > > places as well. I would >> definitely >> > like to volunteer for >> > > this. >> > > > > > > > > > >> > > > > > > > > > On Tue, Jan 21, 2020 at 1:52 PM >> > vino yang < >> > > > yanghua1...@gmail.com >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi folks, >> > > > > > > > > > > >> > > > > > > > > > > Currently, the code quality >> > of some Hudi module is not very >> > > > > well. >> > > > > > > As >> > > > > > > > > many >> > > > > > > > > > > developers have seen, the >> > Intellij IDEA has shown many >> > > > > > intellisense >> > > > > > > > > about >> > > > > > > > > > > cleanup and improvement. >> The >> > community does not object to >> > > > doing >> > > > > > the >> > > > > > > > > > cleanup >> > > > > > > > > > > and improvement work and >> the >> > work has been started via some >> > > > > > direct >> > > > > > > > > > "minor" >> > > > > > > > > > > PRs by some volunteers. The >> > current way is unorganized and >> > > > hard >> > > > > > to >> > > > > > > > > > manage. >> > > > > > > > > > > For tracking this work, I >> > prefer to manage this work with >> > > the >> > > > > > Jira >> > > > > > > > > issue. >> > > > > > > > > > > We can create an umbrella >> > issue. Then, split the work into >> > > > > > several >> > > > > > > > > > > subtasks. >> > > > > > > > > > > >> > > > > > > > > > > Since those "bad smell" >> lays >> > anywhere in the whole project. >> > > > > It's >> > > > > > > > > > difficult >> > > > > > > > > > > to give a standard to split >> > the subtasks. For example, some >> > > > > files >> > > > > > > > have >> > > > > > > > > a >> > > > > > > > > > > lot while some modules have >> > few. So I suggest the standard >> > > > > would >> > > > > > > > depend >> > > > > > > > > > on >> > > > > > > > > > > the volume of the changes. >> > Before working, any subtask >> > > should >> > > > > > find >> > > > > > > a >> > > > > > > > > > > committer as a mentor who >> > would judge and approve the scope >> > > > is >> > > > > > > > > suitable. >> > > > > > > > > > > >> > > > > > > > > > > What do you think? >> > > > > > > > > > > >> > > > > > > > > > > Any comments and >> suggestions >> > would be appreciated. >> > > > > > > > > > > >> > > > > > > > > > > Best, >> > > > > > > > > > > Vino >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> >