Eric/Steve-

Please pick a test- any test- and demonstrate why Powermock would
improve- by any metric- testing in Hadoop. -C



On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <ey...@hortonworks.com> wrote:
> Mock provides tool chains to run simulation for a piece of code.  It helps to 
> prevent null pointer exception, and reduce unexpected runtime exceptions.  
> When a piece of code is finished with a well-defined unit test, it provides 
> great insights to see author’s intention and reasoning to write the code.  
> However, everyone looks at code from a different perspective, and it is often 
> easier to rewrite the code than modifying and update the tests.   The short 
> coming of writing new code, there is always danger of losing existing 
> purpose, workaround buried deep in the code.  On the other hand, if a test 
> program is filling with several pages of initialization code, and override.  
> It is hard to get context of the test case, and easy to lose the original 
> meaning of the test case.  Hence, there are drawback for using mock or full 
> integration test.
>
> I was in favor of using Powermock in favor of giving user the ability to unit 
> test a class and reduce external interference initially.  However, I quickly 
> come to realization that Hadoop usage of protocol buffer serialization 
> technique and java reflection serialization technique have some difference 
> which prevents powermock to work for certain Hadoop classes.
>
> Hadoop unit tests are written to be bigger than one class, and frequently, a 
> mini-cluster is spawned to test 5-10 lines of code.  Any simple API test will 
> trigger large portion of Hadoop code to be initialized.  Hadoop code base 
> will require too much effort to work with Powermock.  Programs outside of 
> Hadoop can use powermock annotation to prevent mocking Hadoop classes, such 
> as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", 
> "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop code base, 
> this technique is not practical because every class in Hadoop prefix with 
> org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix 
> packages that can not work with powermock reflection.
> Hence, I rest my case for re-opening this issue.
>
> Regards,
> Eric
>
> From: Steve Loughran <ste...@hortonworks.com>
> Date: Sunday, October 1, 2017 at 12:36 PM
> To: Eric Yang <ey...@hortonworks.com>
> Cc: Andrew Wang <andrew.w...@cloudera.com>, Chris Douglas 
> <cdoug...@apache.org>, "common-dev@hadoop.apache.org" 
> <common-dev@hadoop.apache.org>
> Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better 
> unit tests
>
>
> On 29 Sep 2017, at 22:46, Eric Yang 
> <ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
>
> Hi Chris and Andrew,
>
> The intend is for new code to have better unit test cases without resort to 
> invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t 
> require refactoring, if the test cases already have good coverages.  I am 
> currently working on part of YARN to improve YARN and Docker integration.  
> There are a lot of code getting triggered for UGI, FileSystem object to Yarn 
> job submission.  My code is only responsible to check the logic of the user 
> input, and expected output prior to YarnClient job submission.  Starting a 
> miniCluster for this test case is excessive for the small piece of code for 
> validation.  The submission code was imported from Slider for YARN native 
> services, a single class imports various Hadoop services.  In several failure 
> cases, it is difficult to simulate exact error conditions because the API is 
> several layers deep.  Powermock provides easy way to replace and stubbing 
> return object or throw proper exception to simulate the failure conditions.  
> One can argue that the code should have been written easier for unit tests, 
> but Hadoop code density is beyond trivial to get simple initialization done.  
> Constructor suppression, inner class replacement and private method override 
> are good tools from Powermock that can provide more accurate testing without 
> losing sights of multiple stage API calling tests while keeping the test case 
> localized to a small piece of the greater puzzle.  Hence, I like to request 
> the community to rethink the improvement that Powermock can bring to the 
> table.  Thank you for your considation.
>
> I don't know enough about powermock to have opinions on the matter. I do know 
> I don't like mocking in general 
> https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one 
> area where I find it most troublesome: maintaining code
>
>
> I' just find that mock code tests to be very brittle to changes in the 
> codepaths of the classes called, so whenever you change the implementation, 
> tests fail. And it's not so much "your code has regressed and we correctly 
> caught it"  failure as "the change in order of invocation caused our test to 
> report a regression when it wasn't really" kind of failure. Which is bad, as 
> you waste time working out that this is the cause, then often fix the 
> problems by moving bits of the test around until it stops failing. Which can 
> hide real regressions.
>
> Where mocking can be good is in that
>
> 1. you can make assertions about how thinga were invoked, though note we've 
> moved in S3A towards actually instrumenting the code and asserting on that. 
> This way our shipping code gets to enjoy better instrumentation. [Note, those 
> assertions can be brittle to changes in implementation too]
>
> 2. You can simulate failure better. But for S3Guard/S3A we've gone and 
> implemented an InconsistentS3Client which can be used downstream (it ships in 
> the hadoop-aws JAR) and so can be used downstream.
>
> 3. You can test things without needing so much support infra (e.g. in unit 
> tests and on jenkins without needing logins, running services)
>
> 4. You can have faster tests, because there's no need to set up/tear down 
> things like HDFS
>
> 5. You can isolate problems to the code under test, rather than looking at 
> the logs of forked processes collected somewhere under target/
>
> I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN 
> cluster is significant. If Powermock helps this, I don't see why we should 
> say "don't use it", as long as we are aware of the cost, which is the risk of 
> creating tests which are brittle to changes in the implementation code
>
>
> FWIW, Mocking is why I couldn't make the init/start/stop methods of 
> org.apache.hadoop.service.AbstractService final; the need to test with 
> mocking can impact production code. Is that bad? Well, we do other things to 
> code to aid testability,...
>
>
> -Steve
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to