RE: code review requested

Brian Raven Fri, 12 Nov 2004 11:00:12 -0800

Jim Schueler wrote:
> Hello Brian.
> 
> Many answers on these forums seem suited for inexperienced
> programmers who need to quickly overcome a hurdle on some deliverable
> code. 
> The answers
> may not be universally appropriate.  So please cut me off if
> I'm making
> too much mischief.
> 
> I use many and have published some CPAN modules.  Given the
> proliferation of modules, I assume there's a module to address almost
> every functional chunk that I need in my code.  Should I adopt the
> strategy of 
> using a CPAN
> module whenever I can?  The answer is easily no, for the following
> reasons: 
>       1.  The CPAN modules are not guaranteed.  There is some
>             risk to each one.  The risk is offset by frequent use
>             and whether the module complexity justifies this cost.


Agreed, the quality of code on CPAN is variable, but there are a large
number of mature and well used and tested modules there. Perhaps yours
are among them.

>         2.  Another cost to CPAN modules is the well known cost of   
> portability. 

To me the cost of portability is in the effort required to design,
develop and test code. Which, IMO, is where CPAN scores big time. Most
of the more mature modules, especially those in the core distribution
will, have been thoroughly tested on multiple platforms, providing
portablility to the module user at relatively small cost.

>         3.  CPAN modules are not optimized for modularity.  There's
>             an efficiency cost to using multiple modules instead of
>             functional extensions.

Don't understand this. "CPAN modules are not optimized for modularity"
sounds like a contradiction. Also, what are "functional extensions" as
compared to multiple modules ?

> 
> That's the primary answer to your primary question.  Parsing
> CSV seems
> trivial enough to warrant "rolling my own".

Sure, it "seems" trivial at first but that changes pretty rapidly when
you start introducing additional conditions and generalisations, as you
do below. I think the code you posted illustrates this point quite well.
It is, IMHO, far from trivial.

> 
> The code below (actually I didn't include it in this
> response) probably
> reflects my obsession with modularity.  For me, CSV parsing
> is a specific
> instance of a general problem:  The need to modify text containing
> protected quoted strings. 
> 
> Essentially, CSV parsing can be performed with one line of code:
>   map { [ split /,/ $_ ] } split /[\r\n]+/, $filebuffer

Only for a very narrow definition of CSV, and a file that will fit into
memory.

> 
> This task hardly warrants its own CPAN module.  The problem
> is that the
> split functions need to ignore quoted commas and newlines.

But split doesn't fulfill those needs, so perhaps it isn't the best way
to solve the problem.

Its those additional needs that make the problem less trivial and harder
to solve. A CPAN module that addresses those additional needs makes the
problem much easier to solve for individual developers.

> 
> The code below substitutes quoted strings in $filebuffer with strings
> containing only characters \000 and \001; performs the split
> functions; then substitutes the original strings back into the
> results.  One additional step removes the terminating quotes, which
> are superfluous to the data.
> 
> Undoubtedly, the bottleneck is the heavy reliance on the
> regex used for
> the substitutions.  Maybe this question should have been on posted on
> comp.lang.perl.regex instead.  I believe Perl adds
> significant overhead to
> parsing operations- maybe this situation requires native
> code.

The reliable only way to tell where the bottlenecks are is to profile
(there's a module for that too :-) ). My guess is that it's all the
mungeing of the data so that you can use split that is taking the time.
AIUI, the Text::CSV* modules use a finite state machine, which probably
involves only a single pass over the data, whereas your approach would
seem to involve several passes.

>  In that
> case, Text::CSV_XS may be the right answer, although I would
> still prefer
> a more general purpose solution.

Strange, I would have said that the Text::CSV* modules were more general
purpose than your code.

Give the Text::CSV* modules a try, it should be pretty trivial :-)

Try profiling your own code. You can sometimes get interesting surprises
doing that.

HTH

-- 
Brian Raven


-----------------------------------------------------------------------
The information contained in this e-mail is confidential and solely 
for the intended addressee(s). Unauthorised reproduction, disclosure, 
modification, and/or distribution of this email may be unlawful. If you 
have received this email in error, please notify the sender immediately 
and delete it from your system. The views expressed in this message 
do not necessarily reflect those of LIFFE Holdings Plc or any of its subsidiary 
companies.
-----------------------------------------------------------------------


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: code review requested

Reply via email to