YA CSV parser
I wrote a streaming CSV parser yesterday because I couldn't find a CSV parsing module that does what I want (despite the plethora of available choices). The parsing rules are pretty simple: 1) At the start of a field, if you find a quote string, eat the quote string and go to the state that handles quoted strings. If you find a separator, add the current field (which is blank) to the line, and start over at this step. If you find an end of line string, add the current field to the current target line, push the target line onto the list of parsed lines, and start over. For anything else, go to the state that handles unquoted strings. 2) In the state that handles quoted strings, search the string sequentially for the first instance of the quote string. Add the string up to that point (not including the quote) to the string. If what immediately follows is another quote string, start this step over. Otherwise, go to the unquoted string state. In the event no quote string is found, append the remainder of the string being processed to the current field. Parsing of the next chunk of data will resume in this state. 3) In the state that handles unquoted strings, search the string sequentially for either the first instance of the separator string, or the first instance of the end of line string. If neither is found, append what's in the string being processed to the current field, and note that the parser will resume in this state. Otherwise, append the part of the string up to, but not including, the separator or end of line that was found to the current field, then, append the current field to the current target line. If the found string was the end of line string, append the current target line onto the list of parsed lines, too. Then, return to the initial state. This set of rules happens to produce results that match how Microsoft's Excel handles CSV files. As you may have determined from my use of separator string, quote string, and end of line string, each of these entities is a string (',', '', and \r\n, by default). They also happen to be parameters, so you can parse other simple text formats, too. For example, to parse a standard /etc/passwd file, you could use :, \0, and \n as the separator, quote, and end of line strings. If anyone knows of a module on CPAN that does all this, please let me know. Otherwise, I'll upload my module sometime in the next week or two. BTW, the name I'm currently using for this module is CSV::Parse - let me know if you have a specific suggestion for a name you like better.
Re: YA CSV parser
A. Pagaltzis wrote: * Jim Schneider [EMAIL PROTECTED] [2007-11-25 20:00]: BTW, the name I'm currently using for this module is CSV::Parse - let me know if you have a specific suggestion for a name you like better. There is already a Parse::CSV on CPAN. I think it would be a bit confusing to have CSV::Parse also, without anything in the names of the modules to distinguish them. Regards, Do you have a suggestion for a better name? One that immediately captures the essence of what I'm doing, but doesn't wind up confusing people, too? Thanks.
Re: YA CSV parser
David Cantrell wrote: On Sun, Nov 25, 2007 at 01:59:46PM -0500, Jim Schneider wrote: I wrote a streaming CSV parser yesterday ... If anyone knows of a module on CPAN that does all this, please let me know. Otherwise, I'll upload my module sometime in the next week or two. BTW, the name I'm currently using for this module is CSV::Parse - let me know if you have a specific suggestion for a name you like better. There are plenty of CSV-ish modules on the CPAN already, but AFAIK none of them handle streaming. So it's the streaming that's important, and it should be in the name. How about Text::CSV::Streaming? I like the word Streaming in the name, but this isn't a complete CSV processing module (it just does parsing, not (re)creating CSV data), so Text::CSV is a bit misleading. Perhaps CSV::Parse::Streaming? CSV::Parser::Streaming?
Re: YA CSV parser
Joshua ben Jore wrote: Didn't you just reinvent Text::CSV_XS? The only tweak required is saying binary to enable the use of newlines inside quoted fields. -new({ binary = 1, # defaults eol = qq(\r\n), sep_char = q(,), quote_char = q(), escape_char = q(), }) Josh It appears I shoulda spent more time reading the man pages. A loop like this: while(my $d = $obj-get_chunk) { my $dh = new IO::Scalar \$d; while(my $f = $csv-getline($dh) and @$f) { process_fields($f); } } works just fine, as long as the $csv object is constructed with the binary = 1 option. I'm not sure what would happen in the case where the data is in pieces that don't correspond to CSV line boundaries, but fortunately for me, my application can guarantee getting the data in complete records. So, looks like CSV::Parse is going away. Thanks, Josh!
Re: RFC: relative.pm
A. Pagaltzis wrote: even with the current interface, it’s possible to load a to.pm if you do it this way: use relative to = __PACKAGE __, qw(to from before after boo); But that’s a) noisy b) less than self-suggesting. My I suggest this: use relative to = self, qw(foo bar roo); Anything else would be used the way it is currently defined. I like relative to = 'self', because it seems a bit more regular, and somewhat self documenting.
Re: Another non-free license - PerlBuildSystem
Ovid wrote: Being an *extremely* political creature, I'm sorely tempted to wade into this mess, but I won't. Can we just agree to stick to the license's suitability for the CPAN? Cheers, Ovid Perhaps this is just a me, too... The law of unintended consequences (Every action has at least two consequences - the one you intended, and at least one you didn't) is at work here. I think it's ironic that some of the biggest organizational contributors to open source (Red Hat, O'Reilly Media, and CPAN come to mind) are barred from using PerlBuildSystem because they don't restrict their distributions to keep them away from armed groups (and are thus suppliers). You can hold whatever political opinions you want. Just be aware that when you try to mix ideology with technology, the technology invariably suffers. But I'm guessing the author of PerlBuildSystem isn't subscribed to this mailing list, anyways.
Re: RFC: new module Finance::MortgageCalculator
Perhaps Finance::Calculator::Mortgage? - Original Message - From: Dmitri Tikhonov [EMAIL PROTECTED] To: Smylers [EMAIL PROTECTED] Cc: module-authors@perl.org Sent: Wednesday, November 01, 2006 9:32 AM Subject: Re: RFC: new module Finance::MortgageCalculator Mortgages may compound differently -- monthly (the most common in the USA), biweekly, or semi-annually (Canadian). The common thing between them is that there's still a fixed number of payments. I will make the interface support all of these methods. The following web site has some formulas and derivations: http://www.hughchou.org/calc/formula.html I guess I could do Finance::MortgageCalculator and Finance::MortgageCalculator::US (which would be an empty subclass) and then people could inherit from Finance::MortageCalculator and create their own country-specific calculators. (I'd still like the word Calculator being present in the name. When I look at module named Finance::Mortgage I think to myself -- mortgage *what*?) - Dmitri.
Re: Proposed module names - DBIx::Class::Simple
From: A. Pagaltzis [EMAIL PROTECTED] DBIx::ORM::Declarative? Perfect! Thank you so much.
Re: Proposed module names - DBIx::Class::Simple
I appologize. I wasn't terribly clear. I was hoping for suggestions as to what name would be appropriate - I'm quite well aware that the names I have are bad. David Landgren wrote: David Golden wrote: Jim Schneider wrote: -snip- I think you may be best if you come up with your own DBIx::* name that captures what you feel is distinctive about your module -- beyond it just being simple. Seconded. Simple modules never are. I'm not trying to be flippant. If the documentation isn't equally simple (for instance, fits on a screenful with no additional provisos or exceptions) then the person using it spends as much time learning how to use it as a supposedly more complex module. Here are the biggest difference between Class::DBI and my module: 1) You don't subclass my module - the necessary subclasses are built on the fly. 2) Collections of tables are described by data structures, and these structures are passed to the module at use time, as contrasted to a subclass calling a bunch of methods in the parent class to establish what a table looks like. For example, if you have a PERSON table (with PERSON_ID, NAME, ADDRESS_ID, and EMAIL columns), and an ADDRESS table (with ADDRESS_ID, LINE1, LINE2, CITY, STATE, ZIP columns (sorry for the USA-centric example)), your use clause would look something like this: use DBIx::Class::Simple ( { schema = 'example', tables = [ { table = 'person', columns = [ { name = 'person_id' }, { name = 'name', }, { name = 'address_id' }, { name = 'email' }, ], }, { table = 'address', columns = [ { name = 'address_id', }, { name = 'line1', }, { name = 'line2', }, { name = 'city', }, { name = 'state', }, { name = 'zip', }, ], }, ], }, ); (I've compressed the example a bit to save space). Once you've used the module, you can say: my $db = DBIx::Class::Simple-new(handle = $dbh); # $dbh is a DBI handle And later on: $schema = $db-example; $ptable = $schema-person; @people = $ptable-search(@criteria) Any suggestions on what to call this beast would be appreciated. -snip- Discussion of the other modules has been moved to separate threads.
Proposed module names
I have three modules I am preparing to submit to CPAN, and I was hoping to get some input on the names. The modules are: 1) DBIx::Class::Simple - a simpler alternative to DBIx::Class, but alas, not compatible (not even a little bit). It takes a collection of data structures that describe your tables, and turns it into a collection of classes that can be used to access them. I'm also open to the name DBIx::Simple::Class or DBIx::Simple::Object 2) WWW::Scraper::Zip4 - a simple web scraper to retrieve address information from the USPS website. 3) Well, I'm currently calling it TemplateLoader, but that's too horrible for words. Your provide some particulars on the module "use" line, and it creates a method in the calling class that loads the template. Any suggestions for this one would definitely be appreciated.