Re: [Boston.pm] maintenance of large perl code bases

Charles Reitzel Tue, 12 Mar 2002 22:50:07 -0800

Thanks, Sean, for raising these issues.  I am fairly new to Perl, but have 
worked for years on that other "write once, read never" language C and, 
worse, C++.  The fact is, for all its well intentioned support for sound SW 
engineering practice, I have seen horrid tangles of Java.  I am forced to 
conclude that the issues and practices related to maintaining large code 
bases are not language specific.  To the contrary, I found the discipline 
learned coding in C has served me very well in Perl and Java.

First things first, may I suggest that, before storyboarding a 
presentation, that you write a coding standards document.  If you want 
training to go with the standards, more power to you.  But a coding 
standards document is a handy reference for developers at work. Consider 
this email as sort of a rough draft, if you will.

What are the goals of coding standards?

In order:
1) Quality
2) Clarity
3) Performance

Now, to achieve these ends, developers have worked out some common 
practices that minimize unintended side effects from changes, minimize 
naming conflicts, maximize future code re-use and maximize future 
implementation flexibility.  I think of Quality, Clarity and Performance as 
the strategic goals, while these others are the tactical objectives.  They 
also introduce issues related to larger code bases with multiple developers 
working simultaneously.  So, to cut to the chase, here are a few practices 
that will go a long way to keep you out of the soup!  I make no claims of 
authorship for any of these things.  They are just ideas that I have 
learned from other talented and knowledgeable developers along the way.  If 
I had to boil it down to one word, it is Modularity.

Dependencies Management

Early in your coding phase, draw a block diagram that shows runtime 
dependencies vertically.  Each block represents a module. Modules appearing 
directly above another module will call into the lower module and are said 
to "depend" on that module.  Here's the thing, if you can draw this 
diagram, you need to rethink your design.  Although, it is easy enough to 
code bi-directional dependencies, it is a bad idea and should only be done 
as a last resort.  Leave 3rd party modules out of the diagram, they don't 
call you (well, callbacks are possible, but this is not actually a 
violation and are beyond the scope of an email).

In a typical scenario, let's say a CGI script, you have a top-level 
module.  It will "use" or "require" other modules, perhaps 
dynamically.  These modules, in turn, will use and require other modules, 
and so on, until all necessary modules are loaded. Almost every shop has a 
utility module or two or three, which depend on nothing else.  I.e. any 
module can safely depend on these.  In fact, these modules may depend on 
different 3rd party modules, which may be the driver for separating them in 
the first place.  E.g. local HTML widget routines will depend on different 
things than date manipulation logic, and so on.  In theory, both could be 
used within either a mod_perl page or a CGI generated page.

Testing

Now think about the usual development cycle.  As coding nears completion, 
testing and bug fixing takes up more and more of everyone's time.  Now the 
module loader will not have a problem with circular references, but your 
testing procedures will.

So, here's the deal.  Modules at the bottom "freeze" first.  Any changes to 
these modules will require a complete regression test of all "dependent" 
modules.  Thus, modules at the bottom, also tend to have the most thorough 
testing harnesses.  Thus, all of its public interfaces can be regression 
tested to weed out as many errors as possible before double checking all 
the dependent modules.  The rigor of these procedures is proportional to 
the cost of a release (in downtime, manufacturing costs, training costs, 
etc., etc.).

While were on the subject, I have found the convention of writing a simple 
test script for each Perl package (.pm file) to be very effective.  If 
nothing more, it helps to debug library-type code.  Once it exists, it can 
help to spot problems early by acting as a basic regression test.  To this 
end, I have written a simple little  assert routine which, like the C 
macro, evaluates an expression and, if false or an exception is caught, it 
prints the results.

I have attached the file assert.pl and a sample test script.  Note this 
needs to be "required" into each test script so that any variables declared 
in the module will be defined inside the eval().  Comments, correction and 
suggestions are appreciated.

Encapsulation

This is object orientation 101.  If you get encapsulation, by comparison, 
everything else is gravy.  Perhaps the simplest way to define encapsulation 
is to say, in Perl terms, modules that depend on a module Acme::XYZ should 
not be broken if it changes from blessed array references to blessed hash 
references.  The only way to make this happen, is to not access the data 
structures directly from other modules.  Instead, write "methods" (member 
functions, messages, what have you) to access or modify the state of the 
object.

What does this mean for you?  You have to take a bit of time, up front, to 
define what the interface should be.  What does this buy you?  First, it 
lets two different developers (or teams) work in parallel with a high 
degree of confidence that their stuff will work together at the end.  This 
can be a lifesaver on a compressed schedule.  Second, it gives each team 
the ability to make improvements to their implementation without having to 
even confer with the other team(s).  They know, if they keep the "contract" 
intact, that everything will be OK.  Now, contracts depend on more than 
syntax.  But syntax is a necessary, if not sufficient, part of compatibility.

Naming

Naming really matters.  You just can't name all your variables tmp1, tmp2 
and expect anyone, even yourself, to understand what a routine is 
doing.  This goes triple for type names (i.e. Perl packages, C++ or Java 
classes).  Think very, very carefully - for 15 minutes - about module 
names.  Be ready, in the first week, to rename a package if you didn't get 
it right.  After that, it's set in concrete.  You're stuck with it.  Oh 
well.  The name should describe, in terms of the application, what data or 
behavior the variable represents.  Be wordy as hell for module names and 
global variables.  Be concise but not obscure for local variables.  4-8 
chars is OK, this isn't COBOL.  Never, ever use 1 character variables.  Not 
even "i".  Try "ix".

For in-house code, pick a top level package qualifier.  If you work for 
Acme, Inc.  Use names like Acme::Util, Acme::Date, Acme::Form, etc.  this 
way your module names won't collide with modules off of CPAN.  Perl makes 
this easy.  Along these same lines, observe good Perl etiquette and do not 
export more than you need to (if anything) from your modules.  This was a 
mistake I made often when first learning Perl.

Pick a capitalization scheme and stick to it.  For myself, I like 
module/class names starting with caps, whereas variables begin with lower 
case - with all of these using camel case to join multiple words.  E.g. my 
$formGen = Acme::FormGen->new( $cgi );  This way I don't have to switch 
conventions when I switch to Java or C++!

Performance

Without getting into details, suffice it to say that it is much, much 
easier to find and fix performance problems in well organized, maintainable 
code.  For example, encapsulation allows you to optimize performance of a 
shared module without breaking anything else.  This stuff really works!

Well, that's the basics.  I'm sure there are more things I could mention, 
but you don't want a book here.  I'll wager most of these look familiar to 
most folks coming from the systems side.  But I'll bet there is a scientist 
or two out there new to Perl that will benefit.  Hope this helps.

Enjoy,
Charlie

At 11:01 PM 3/12/2002 -0500, Sean Quinlan wrote:

>[forwarded submission from a non-member address -- rjk]
>
>
>From: Sean Quinlan <[EMAIL PROTECTED]>
>Date: Tue, 12 Mar 2002 20:43:52 -0500
>Subject: maintenance of large perl code bases
>To: [EMAIL PROTECTED]
>
>I had hoped to bring up this question at tomorrows meeting, but 
>Wednesday's are hard, and tomorrow looks impossible. So maybe someone can 
>toss this up for discussion, and hopefully let the list know the key points.
>
>I know there are sights out there, such as Boston.com it appears, and I've 
>heard about some large financial institutions, that rely on substantial 
>amounts of Perl code. Obviously for a successful business, having that 
>code be maintainable is (or should be!) of significant importance. But I 
>regularly hear complaints, largely from non-Perl (or Perl primary anyway) 
>people from other industries coming into bioinformatics, about these large,
>unmaintainable Perl code bases.
>
>Now, in my experience, I have to admit this is largely more true than 
>not.  Usually because most of the software was written by people who were 
>biologists/engineers/physicists/whatever first, and programmers (sometimes 
>distant) second, often without thought or concern of it's long term 
>usability. So I've heard of a few places now moving away from Perl, 
>frequently apparently forcing a large ground up recode in some other 
>(usually in Java, and I've heard some interesting 'rumors' as to why) language.
>
>I see little point in arguing with this from the standpoint of simply Perl 
>first. I know others better than I have done talks and presentations on 
>writing maintainable Perl code, and probably on the problems with porting 
>old code to a more maintainable format. I want to steal from those 
>people... blatantly (with credits of course).
>
>What I would like to do is to collaborate with a few people who have:
>1) Done presentations related to the subject of code maintenance (and a 
>little QA thrown in might be good).
>
>2) Have been involved with or responsible for large installations of Perl 
>code that was well maintained.
>
>3) Others involved with bioinformatics interested in or having experience 
>with this problem.
>
>What I would like to and up with are sources for presentations (preferably 
>a couple already canned of varied lengths) on the subject of maintaining 
>large Perl code bases written specifically as it applies to 
>bioinformatics. If you don't want/have time to collaborate, but have 
>pointers to good sources of information/inspiration, please also pipe up.
>
>Thanks everyone!!!
>
>--------------------------------------------------------------
>Sean P. Quinlan
>http://people.ne.mediaone.net/squinlan/index.html
>mailto:[EMAIL PROTECTED]
>"You can discover more about a person in an hour of play than in a year of 
>conversation" - Plato

1;

sub evalordie
{
  my $code = shift;
  my $mssg = shift;
  my $ok = 0;
  my ( $pkg, $file, $line, $subroutine ) = caller( 1 );
  if ( $subroutine eq 'main::assert' ) {
    ( $pkg, $file, $line, $subroutine ) = caller( 2 );
  }

  eval( $code );
  if ( $@ eq '' ) {
    $ok = 1;
  }
  if ( ! $mssg ) {
    $mssg = $code;
  }
  print "Testing \"$mssg\"\n$subroutine in $file, line $line. " 
      . ($ok ? 'PASS' : 'FAIL') . ". $@ \n";
  return $ok;
}

sub assert
{
  my $cond = shift;
  my $mssg = shift;
  if ( ! $mssg ) {
    $mssg = $cond;
  }
  my $code = "if ( ! ($cond) ) { die \"Error $cond\"; }";
  return evalordie( $code, $mssg );
}

#!/usr/local/bin/perl -w

require 5.005;
use strict;
use File::Basename;

use Acme::CMS::Config;
use Acme::Common::ValueMap;

use vars qw( $tag $lang $qual $mapobj $mapref $nCode $nDisp $nMap 
             %lutbl $inpval $inpres $outval $outres );

require 'assert.pl';

BEGIN
{
  $tag    = 'country';
  $lang   = 'fr';
  $qual   = 'en';
  $mapobj = undef;
  $mapref = undef;
  $nCode  = 0;
  $nDisp  = 0;
  $nMap   = 0;
  %lutbl  = ();

  $inpval = 'UNITED STATES';
  $inpres = 'us';

  $outval = 'us';
  $outres = '�tats Unis';
}

sub testvaluemap()
{
  my $ok = 1;

  eval
  {
    # Output Mapping
    $ok &= assert( '$mapobj = Acme::Common::OutputMap->load( $tag, $lang )', 
                   'OutputMap::load' ); 

    $ok &= assert( '$mapref = $mapobj->getDisplayValues()',
                   'getDisplayValues' ); 
    $nDisp = @{ $mapref };
    $ok &= assert( '$nDisp > 0', 'getDisplayValues() > 0 items' );

    $ok &= assert( '$mapref = $mapobj->getValues()', 
                   'getValues' ); 
    $nCode = @{ $mapref };
    $ok &= assert( "$nCode == $nDisp", '# Codes == # Display' );

    $ok &= assert( '$mapobj->mapValue( $outval ) eq $outres', 
                   "mapValue($outval) eq $outres" );

    $ok &= assert( '$mapobj->addMap( \'dummy\', \'Dummy\')', 'addMap' );
    $ok &= assert( '$mapobj->replaceMap( \'dummy\', \'Dumber\')', 'replaceMap' );
    $ok &= assert( '$mapobj->deleteMap( \'dummy\')', 'deleteMap' );

    # Input Mapping
    $ok &= assert( '$mapobj = Acme::Common::InputMap->load( $tag, $qual )', 
                   'InputMap::load' ); 

    $ok &= assert( '$mapref = $mapobj->getOutputValues()',
                   'getDisplayValues' ); 
    $nDisp = @{ $mapref };
    $ok &= assert( '$nDisp > 0', 'getOutputValues() > 0 items' );

    $ok &= assert( '$mapref = $mapobj->getInputValues()', 
                   'getInputValues' ); 
    $nCode = @{ $mapref };
    $ok &= assert( "$nCode == $nDisp", '# Codes == # Display' );

    $ok &= assert( '$mapobj->mapValue( $inpval ) eq $inpres', 
                   "mapValue($inpval) eq $inpres" );

    $ok &= assert( '$mapobj->addMap( \'dummy\', \'Dummy\')', 'addMap' );
    $ok &= assert( '$mapobj->replaceMap( \'dummy\', \'Dumber\')', 'replaceMap' );
    $ok &= assert( '$mapobj->deleteMap( \'dummy\')', 'deleteMap' );
  };
  $ok &= ( $@ eq '' );
  return $ok;
}

testvaluemap();

Re: [Boston.pm] maintenance of large perl code bases

Reply via email to