Re: taxonomy for data-structure modules?

Eric Wilhelm Wed, 23 May 2007 20:12:22 -0700

# from David Nicol
# on Wednesday 23 May 2007 02:36 pm:

>but perl syntax is expressive enough, that I have trouble imagining
>when a data structure would be something to put in a module, rather
>than create as needed


I did used to have the "don't need no stinkin' objects" attitude (which 
may or may not be what you're saying here), but large codebases seem to 
have beaten it out of me.  I do still *deeply* appreciate that Perl has 
a spectrum of formality available.  This means I can bang-out the 
simplest thing that could possibly work and then step back to look at 
it and say "I'm an idiot" when I see something simpler.  All this 
without creating 20 classes or whatever -- but once I see the 
implementation start to leak outside of a single subroutine it really 
starts to look like an object would be more robust.

>What do you mean by "data-structure module?"

Trees, ordered hashes, and sets just to name a few.  The vocabulary 
quickly gets pretty vague and I've asked "what do I call this 
structure/pattern" kinds of questions here more than once before.  
(That's precisely why I wonder if we need something besides a standard 
search scheme specifically for data structures -- because they tend to 
travel under many aliases.)

Whenever the form of the data starts to have rules, it's quite likely 
that somebody is going to come along and break the rules unless you 
encapsulate it in some way.  Note these rules aren't necessarily about 
what data/types can and cannot be stored or whether swear words are 
allowed (though those things can sometimes be a valid use of 
encapsulation.)  In this particular case, it is just a matter of 
maintaining the data integrity and managing psuedo-asynchronous access 
to it.

Testing and code reuse are also good reasons to modularize a data 
structure.  Anything non-trivial should be tested against regression, 
but focussed testing just isn't possible if it is implemented as ad-hoc 
scattered bits of code juggling the contents of a reference.

Even typo'd hash keys can be a hair-pulling problem when code grows 
beyond a few hundred lines.  Having well-tested objects handling all of 
the juggly bits means you spend your time chasing bugs in your code and 
not your data structures.

In this particular case, the array-of-arrays has a few important 
characteristics:

  1.  each entry is identifiable
    (the id's need to be unique within the object and therefore should
    not have to be user-supplied -- adding an item should return this id
    for future reference.)

  2.  There is a "current" item.
    (By convention, the end of the list, but I might want to extend this
    to an arbitrary "cursor" concept -- at which point encapsulation
    becomes important because removing an item now impacts not only the
    array state, but also possibly requires the cursor to be adjusted.)

  3.  An item may be deleted from any position (not just the end.)

So, it is desirable to make an object and encapsulate the 
data-management.  Attempting to access missing data throws an error, 
the cursor is always right, etc.  I get to trust that the object will 
"just do it's job" (delegation.)  It just makes the code more concise, 
robust, and workable.

Spreading data-structure code around in an ad-hoc way quickly becomes 
unmanageable once there are a few data structures involved in the same 
code block.  Taking that sort of practice to an extreme, you end up 
with monolithic code in one package using implicit global variables 
with no hope of ever saying 'use strict' without a complete rewrite.

The converse of all this is of course when you spend time looking for 
said code on CPAN and don't find it.

--Eric
-- 
Turns out the optimal technique is to put it in reverse and gun it.
--Steven Squyres (on challenges in interplanetary robot navigation)
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------

Re: taxonomy for data-structure modules?

Reply via email to