Eric/Steve- Please pick a test- any test- and demonstrate why Powermock would improve- by any metric- testing in Hadoop. -C
On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <ey...@hortonworks.com> wrote: > Mock provides tool chains to run simulation for a piece of code. It helps to > prevent null pointer exception, and reduce unexpected runtime exceptions. > When a piece of code is finished with a well-defined unit test, it provides > great insights to see author’s intention and reasoning to write the code. > However, everyone looks at code from a different perspective, and it is often > easier to rewrite the code than modifying and update the tests. The short > coming of writing new code, there is always danger of losing existing > purpose, workaround buried deep in the code. On the other hand, if a test > program is filling with several pages of initialization code, and override. > It is hard to get context of the test case, and easy to lose the original > meaning of the test case. Hence, there are drawback for using mock or full > integration test. > > I was in favor of using Powermock in favor of giving user the ability to unit > test a class and reduce external interference initially. However, I quickly > come to realization that Hadoop usage of protocol buffer serialization > technique and java reflection serialization technique have some difference > which prevents powermock to work for certain Hadoop classes. > > Hadoop unit tests are written to be bigger than one class, and frequently, a > mini-cluster is spawned to test 5-10 lines of code. Any simple API test will > trigger large portion of Hadoop code to be initialized. Hadoop code base > will require too much effort to work with Powermock. Programs outside of > Hadoop can use powermock annotation to prevent mocking Hadoop classes, such > as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", > "org.apache.hadoop._", "com.sun.*"}) . However, working in Hadoop code base, > this technique is not practical because every class in Hadoop prefix with > org.apache.hadoop. It will be heavy upkeep to maintain the list of prefix > packages that can not work with powermock reflection. > Hence, I rest my case for re-opening this issue. > > Regards, > Eric > > From: Steve Loughran <ste...@hortonworks.com> > Date: Sunday, October 1, 2017 at 12:36 PM > To: Eric Yang <ey...@hortonworks.com> > Cc: Andrew Wang <andrew.w...@cloudera.com>, Chris Douglas > <cdoug...@apache.org>, "common-dev@hadoop.apache.org" > <common-dev@hadoop.apache.org> > Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better > unit tests > > > On 29 Sep 2017, at 22:46, Eric Yang > <ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote: > > Hi Chris and Andrew, > > The intend is for new code to have better unit test cases without resort to > invocation of miniHDFSCluster or miniYarnCluster. Existing code don’t > require refactoring, if the test cases already have good coverages. I am > currently working on part of YARN to improve YARN and Docker integration. > There are a lot of code getting triggered for UGI, FileSystem object to Yarn > job submission. My code is only responsible to check the logic of the user > input, and expected output prior to YarnClient job submission. Starting a > miniCluster for this test case is excessive for the small piece of code for > validation. The submission code was imported from Slider for YARN native > services, a single class imports various Hadoop services. In several failure > cases, it is difficult to simulate exact error conditions because the API is > several layers deep. Powermock provides easy way to replace and stubbing > return object or throw proper exception to simulate the failure conditions. > One can argue that the code should have been written easier for unit tests, > but Hadoop code density is beyond trivial to get simple initialization done. > Constructor suppression, inner class replacement and private method override > are good tools from Powermock that can provide more accurate testing without > losing sights of multiple stage API calling tests while keeping the test case > localized to a small piece of the greater puzzle. Hence, I like to request > the community to rethink the improvement that Powermock can bring to the > table. Thank you for your considation. > > I don't know enough about powermock to have opinions on the matter. I do know > I don't like mocking in general > https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one > area where I find it most troublesome: maintaining code > > > I' just find that mock code tests to be very brittle to changes in the > codepaths of the classes called, so whenever you change the implementation, > tests fail. And it's not so much "your code has regressed and we correctly > caught it" failure as "the change in order of invocation caused our test to > report a regression when it wasn't really" kind of failure. Which is bad, as > you waste time working out that this is the cause, then often fix the > problems by moving bits of the test around until it stops failing. Which can > hide real regressions. > > Where mocking can be good is in that > > 1. you can make assertions about how thinga were invoked, though note we've > moved in S3A towards actually instrumenting the code and asserting on that. > This way our shipping code gets to enjoy better instrumentation. [Note, those > assertions can be brittle to changes in implementation too] > > 2. You can simulate failure better. But for S3Guard/S3A we've gone and > implemented an InconsistentS3Client which can be used downstream (it ships in > the hadoop-aws JAR) and so can be used downstream. > > 3. You can test things without needing so much support infra (e.g. in unit > tests and on jenkins without needing logins, running services) > > 4. You can have faster tests, because there's no need to set up/tear down > things like HDFS > > 5. You can isolate problems to the code under test, rather than looking at > the logs of forked processes collected somewhere under target/ > > I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN > cluster is significant. If Powermock helps this, I don't see why we should > say "don't use it", as long as we are aware of the cost, which is the risk of > creating tests which are brittle to changes in the implementation code > > > FWIW, Mocking is why I couldn't make the init/start/stop methods of > org.apache.hadoop.service.AbstractService final; the need to test with > mocking can impact production code. Is that bad? Well, we do other things to > code to aid testability,... > > > -Steve > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org