Jim Schueler wrote: > Hello Brian. > > Many answers on these forums seem suited for inexperienced > programmers who need to quickly overcome a hurdle on some deliverable > code. > The answers > may not be universally appropriate. So please cut me off if > I'm making > too much mischief. > > I use many and have published some CPAN modules. Given the > proliferation of modules, I assume there's a module to address almost > every functional chunk that I need in my code. Should I adopt the > strategy of > using a CPAN > module whenever I can? The answer is easily no, for the following > reasons: > 1. The CPAN modules are not guaranteed. There is some > risk to each one. The risk is offset by frequent use > and whether the module complexity justifies this cost.
Agreed, the quality of code on CPAN is variable, but there are a large number of mature and well used and tested modules there. Perhaps yours are among them. > 2. Another cost to CPAN modules is the well known cost of > portability. To me the cost of portability is in the effort required to design, develop and test code. Which, IMO, is where CPAN scores big time. Most of the more mature modules, especially those in the core distribution will, have been thoroughly tested on multiple platforms, providing portablility to the module user at relatively small cost. > 3. CPAN modules are not optimized for modularity. There's > an efficiency cost to using multiple modules instead of > functional extensions. Don't understand this. "CPAN modules are not optimized for modularity" sounds like a contradiction. Also, what are "functional extensions" as compared to multiple modules ? > > That's the primary answer to your primary question. Parsing > CSV seems > trivial enough to warrant "rolling my own". Sure, it "seems" trivial at first but that changes pretty rapidly when you start introducing additional conditions and generalisations, as you do below. I think the code you posted illustrates this point quite well. It is, IMHO, far from trivial. > > The code below (actually I didn't include it in this > response) probably > reflects my obsession with modularity. For me, CSV parsing > is a specific > instance of a general problem: The need to modify text containing > protected quoted strings. > > Essentially, CSV parsing can be performed with one line of code: > map { [ split /,/ $_ ] } split /[\r\n]+/, $filebuffer Only for a very narrow definition of CSV, and a file that will fit into memory. > > This task hardly warrants its own CPAN module. The problem > is that the > split functions need to ignore quoted commas and newlines. But split doesn't fulfill those needs, so perhaps it isn't the best way to solve the problem. Its those additional needs that make the problem less trivial and harder to solve. A CPAN module that addresses those additional needs makes the problem much easier to solve for individual developers. > > The code below substitutes quoted strings in $filebuffer with strings > containing only characters \000 and \001; performs the split > functions; then substitutes the original strings back into the > results. One additional step removes the terminating quotes, which > are superfluous to the data. > > Undoubtedly, the bottleneck is the heavy reliance on the > regex used for > the substitutions. Maybe this question should have been on posted on > comp.lang.perl.regex instead. I believe Perl adds > significant overhead to > parsing operations- maybe this situation requires native > code. The reliable only way to tell where the bottlenecks are is to profile (there's a module for that too :-) ). My guess is that it's all the mungeing of the data so that you can use split that is taking the time. AIUI, the Text::CSV* modules use a finite state machine, which probably involves only a single pass over the data, whereas your approach would seem to involve several passes. > In that > case, Text::CSV_XS may be the right answer, although I would > still prefer > a more general purpose solution. Strange, I would have said that the Text::CSV* modules were more general purpose than your code. Give the Text::CSV* modules a try, it should be pretty trivial :-) Try profiling your own code. You can sometimes get interesting surprises doing that. HTH -- Brian Raven ----------------------------------------------------------------------- The information contained in this e-mail is confidential and solely for the intended addressee(s). Unauthorised reproduction, disclosure, modification, and/or distribution of this email may be unlawful. If you have received this email in error, please notify the sender immediately and delete it from your system. The views expressed in this message do not necessarily reflect those of LIFFE Holdings Plc or any of its subsidiary companies. ----------------------------------------------------------------------- _______________________________________________ ActivePerl mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs