Karl Wiegers has several interesting papers that discuss various forms
of peer review, which you have probably seen already.

http://www.processimpact.com/pubs.shtml#pr

This one does a pretty good job of characterizing the various forms of
peer reviews, the difference in their objectives, and the difference
in the steps performed.  (It even briefly cites a paper Philip and
Adam Porter wrote, csdl-97-02)

http://www.processimpact.com/articles/two_eyes.pdf

Based on that paper's definitions, I think that Jupiter is best suited
for team reviews, but could perhaps be adapted for inspections,
walkthroughs, and passarounds.  All dependent upon the process steps
you impose external to the tool.

On the question of having a meeting, it'd be nice to know Active Time
associated with review preparation.  We held a review the other day
where several of the people showed up without first reviewing the code
at all.  That turned out to be more trouble that it should have been. 
On days like that I'd love to reschedule the review if there wasn't a
"reasonable" (like at least 5-15 minutes, perhaps) amount of time put
in.

Anyway.  Just my two cents.

--Tim

On Sat, 25 Sep 2004 07:17:47 -1000, Takuya Yamashita <[EMAIL PROTECTED]> wrote:
> > > - First off, I don't believe that the Jupiter Sensor is working properly.  I 
> > > don't seem
> > > to be sending any data and I don't see anything in the logs.  I believe I have 
> > > the
> > > latest version of the sensor and have all the settings configured correctly.  Has
> > > anyone successfully sent Review data to the server?
> >
> > I struggled with this myself today and can't get any data sent to the server, even 
> > after
> > installing last night's build on the public server and installing the corresponding
> > sensors in Eclipse.  The problem appears to be on the client-side---there are no 
> > error
> > messages on the public server console, for example.  Takuya, can you check this 
> > out?
> 
> Confirmed. I checked that the sensor that is downloaded from the public
> site does not work. it seems that the jupiter sensor was not instantiated
> in Eclipse even though the Hackystat sensor and jupiter is successfully
> instantiated. In addition, in my virtual environment (i.e. launching hacky
> jupiter senor in my Eclipse) they work. Let me investigate what's wrong.
> 
> > - Review Issues are product metrics much like FileMetrics. So, I'm wondering if it
> > would be easier to create an ant based sensor for Review Issues, instead of trying 
> > to
> > capture the Issues using Jupiter.  An ant based sensor makes more sense if it is
> > easier.
> 
> This actually seems harder to me than the current situation, in that we now have to 
> run
> Ant to get the issue data sent to the server. I don't see any real cost to the 
> current
> approach.
> 
> > > - Review Issues are product metrics much like FileMetrics. So, I'm wondering if 
> > > it
> > > would be easier to create an ant based sensor for Review Issues, instead of 
> > > trying to
> > > capture the Issues using Jupiter.  An ant based sensor makes more sense if it is
> > > easier.
> >
> > This actually seems harder to me than the current situation, in that we now have 
> > to run
> > Ant to get the issue data sent to the server. I don't see any real cost to the 
> > current
> > approach.
> 
> I agree with Philip too. it seems easier than that of Jupiter. However,
> even if Ant deals with review issues, we still need the review issues
> which are generated by review tool such as Jupiter. I guess the
> hacky-jupiter sensor mechanism is convenient and reasonable way.
> 
> > > Actually, I'm also thinking that we could generate a report based on the Review
> > > Issues.  I think this would be useful to be able to see all the outstanding 
> > > issues that
> > > have not been fixed.  In addition, this report could help the education of 
> > > developers,
> > > as they can refer to issues that are associated with code similar to what they 
> > > are
> > > writing.
> >
> > I agree that this would be a very interesting analysis report to provide. For 
> > example,
> > you could list the number of non-resolved issues by priority. There could also be a
> > Reduction function so that you can get Telemetry regarding the total number of 
> > review
> > issues that are open and how that changes over time.
> 
> I am not sure this is the report in local or server though, it would be
> nicer if it is implemented in Hackystat server to be cooperate with
> project group. However, this could be implemented as one of Ant task if
> the report is needed in local.
> 
> > > - ReviewId in the ReviewIssue SDT - It seems odd that the Review Id is so random.
> > > Wouldn't it be better to use <ReviewId>-<ReviewerId>-<created timestamp> or 
> > > something
> > > like that?  If the Review Id is truly random at some point you will have a 
> > > duplicate Id
> > > depending how good your random generator is.  But, I guess in the end it really 
> > > doesn't
> > > matter.
> >
> > A more nicely structured reviewID would seem to be better.
> 
> I am not sure how we can deal with it. Is it better for Jupiter to check
> the duplication name when new review ID is created? Even though it does,
> it would be harder for jupiter to check the duplicated name over
> projects (opened / closed, or not imported).
> 
> > A more nicely structured reviewID would seem to be better.
> >
> > > - reviewId in the ReviewAcitivty SDT
> > > The information provided in the SDT help page says "reviewId - The unique 
> > > reviewID for
> > > the review entry, eg takuyay".  I believe you meant "reviewId - The unique 
> > > reviewID for
> > > the review entry, eg SelectionInterval1"
> >
> > Yes.
> 
> Thank you. it's fixed.
> 
> > > - The long term goal of the Review Metrics are to be able to evaluate our review
> > > process and the effectiveness of our review.  I think that we should be able to 
> > > bring
> > > up Hackystat before a review meeting and discuss the effectiveness of the review 
> > > based
> > > on the number and severity of review issues generated.  There are many 
> > > statistical
> > > information that can be derived.  For example, is one hour enough preparation 
> > > time? Or
> > > is it the most effective?
> >
> > This is a GREAT idea.  I had never thought of using Hackystat as a way of checking 
> > to
> > see, for example, whether or not it was even appropriate to do the review meeting. 
> > For
> > example, you might decide to not do the review meeting if there are no critical or 
> > major
> > issues uncovered during preparation. You might decide to delay the group meeting if
> > preparation time was not sufficient.
> 
> It seems interesting idea. However, it also seems to answer these
> research questions.
> 
> > is one hour enough preparation time?
> 
> How do we determine that one hour is enough or not? We might correct
> questionnaire depending upon the amount of the review materials (i.e.
> number of class files) to adjust the enough-feeling one hour review
> though...
> 
> > you might decide to not do the review meeting if there are no critical or major
> > issues uncovered during preparation. You might decide to delay the group meeting if
> > preparation time was not sufficient.
> 
> This may or may not be good because as you see, almost reviewers use
> the default severity (i.e. in our case, unset or normal) so that we can
> not see there exist really critical or major issues. Even though we can
> force them to chose one of them, it would be hard to determine that
> review should be held or not because the magnitude of severity that each
> reviewer thought might be different from that of the other reviewers
> think in the team phase.
> 
> > > - It would also be interesting to test the age old belief that reviews decrease 
> > > the
> > > number of defects.  We can do this easily if we can associate a defect to a 
> > > particular
> > > class and checking if that class has been reviewed.
> >
> > I don't think this is easy at all to do.  There are all sorts of conflicting 
> > independent
> > variables, including the complexity of the code and the skill of the author. 
> > You'll need
> > a very large sample size to factor this stuff out.
> 
> I agree with Philip. As well as the complexity and the skill, the
> deduction of defects might be caused by another factors such as test
> cases and so forth.
> 
> > > - Another major goal is to catch defects early in the development process. 
> > > However, I
> > > would claim that for our situation we don't really adopt that trend.  Projects 
> > > that
> > > follow that trend tend to be larger systems where testing is expensive.  I 
> > > believe that
> > > most of the defects are caught in our Unit Test cases (of course that is if we 
> > > have
> > > good test cases).  Our review process seems more like confirmation and I would 
> > > claim
> > > that we would have less critical defects (defects that cause the program to 
> > > function
> > > incorrectly) than typical software projects.  I would also claim that when we 
> > > review
> > > code in CSDL, we need to pay more attention to the Unit Tests; are we testing the
> > > program correctly, effectively, and thoroughly?
> >
> > More good ideas here.  For example, if we are paying attention to Unit Tests, then 
> > it
> > would be reasonable to expect that coverage would go up after the rework following 
> > a
> > review.  Coverage certainly shouldn't go _down_ after the rework following a 
> > review!  We
> > might also expect that test case failures related to the module under review 
> > should go
> > down after the rework.  All of these are hypotheses that could be empirically 
> > tested.
> 
> To examine the relationship between # of unit test and review, we might
> re-define our review process and defect category precisely. If we
> conduct review ambiguously, checking test and coverage in a review
> increases in some extent for a while, but not to be a trend for long
> period. If we define more review process (e.g. we are supposed to check
> test cases and coverage, and invoke both of them in every review time),
> we might see the relationship that review increase the number of
> coverage and test cases and so forth. Our review objective is rather
> spreading our knowledge than finding defects, it might be difficult to
> claim that.
> 
> Cheers,
> 
> Takuya
>

Reply via email to