YA CSV parser

2007-11-25 Thread Jim Schneider
I wrote a streaming CSV parser yesterday because I couldn't find a CSV 
parsing module that does what I want (despite the plethora of available 
choices).  The parsing rules are pretty simple:


1)  At the start of a field, if you find a quote string, eat the quote 
string and go to the state that handles quoted strings.  If you find a 
separator, add the current field (which is blank) to the line, and start 
over at this step.  If you find an end of line string, add the current 
field to the current target line, push the target line onto the list of 
parsed lines, and start over.  For anything else, go to the state that 
handles unquoted strings.


2)  In the state that handles quoted strings, search the string 
sequentially for the first instance of the quote string.  Add the string 
up to that point (not including the quote) to the string.  If what 
immediately follows is another quote string, start this step over.  
Otherwise, go to the unquoted string state.  In the event no quote 
string is found, append the remainder of the string being processed to 
the current field.  Parsing of the next chunk of data will resume in 
this state.


3)  In the state that handles unquoted strings, search the string 
sequentially for either the first instance of the separator string, or 
the first instance of the end of line string.  If neither is found, 
append what's in the string being processed to the current field, and 
note that the parser will resume in this state.  Otherwise, append the 
part of the string up to, but not including, the separator or end of 
line that was found to the current field, then, append the current field 
to the current target line.  If the found string was the end of line 
string, append the current target line onto the list of parsed lines, 
too.  Then, return to the initial state.


This set of rules happens to produce results that match how Microsoft's 
Excel handles CSV files.  As you may have determined from my use of 
separator string, quote string, and end of line string, each of 
these entities is a string (',', '', and \r\n, by default).  They 
also happen to be parameters, so you can parse other simple text 
formats, too.  For example, to parse a standard /etc/passwd file, you 
could use :, \0, and \n as the separator, quote, and end of line 
strings.


If anyone knows of a module on CPAN that does all this, please let me 
know.  Otherwise, I'll upload my module sometime in the next week or two.


BTW, the name I'm currently using for this module is CSV::Parse - let 
me know if you have a specific suggestion for a name you like better.


Re: YA CSV parser

2007-11-25 Thread Jim Schneider

A. Pagaltzis wrote:

* Jim Schneider [EMAIL PROTECTED] [2007-11-25 20:00]:
  

BTW, the name I'm currently using for this module is
CSV::Parse - let me know if you have a specific suggestion
for a name you like better.



There is already a Parse::CSV on CPAN. I think it would be a bit
confusing to have CSV::Parse also, without anything in the names
of the modules to distinguish them.

Regards,
  
Do you have a suggestion for a better name?  One that immediately 
captures the essence of what I'm doing, but doesn't wind up confusing 
people, too?


Thanks.


Re: YA CSV parser

2007-11-25 Thread Jim Schneider

David Cantrell wrote:

On Sun, Nov 25, 2007 at 01:59:46PM -0500, Jim Schneider wrote:

  

I wrote a streaming CSV parser yesterday ...

If anyone knows of a module on CPAN that does all this, please let me 
know.  Otherwise, I'll upload my module sometime in the next week or two.


BTW, the name I'm currently using for this module is CSV::Parse - let 
me know if you have a specific suggestion for a name you like better.



There are plenty of CSV-ish modules on the CPAN already, but AFAIK none
of them handle streaming.  So it's the streaming that's important, and
it should be in the name.  How about Text::CSV::Streaming?
  
I like the word Streaming in the name, but this isn't a complete CSV 
processing module (it just does parsing, not (re)creating CSV data), so 
Text::CSV is a bit misleading.  Perhaps CSV::Parse::Streaming?  
CSV::Parser::Streaming?


Re: YA CSV parser

2007-11-25 Thread Jim Schneider

Joshua ben Jore wrote:

Didn't you just reinvent Text::CSV_XS? The only tweak required is
saying binary to enable the use of newlines inside quoted fields.

-new({
binary = 1,

# defaults
eol = qq(\r\n),
sep_char = q(,),
quote_char = q(),
escape_char = q(),
})

Josh
  
It appears I shoulda spent more time reading the man pages.  A loop like 
this:


while(my $d = $obj-get_chunk) {
   my $dh = new IO::Scalar \$d;
   while(my $f = $csv-getline($dh) and @$f) {
  process_fields($f);
   }
}

works just fine, as long as the $csv object is constructed with the 
binary = 1 option.  I'm not sure what would happen in the case where 
the data is in pieces that don't correspond to CSV line boundaries, but 
fortunately for me, my application can guarantee getting the data in 
complete records.  So, looks like CSV::Parse is going away.


Thanks, Josh!


Re: RFC: relative.pm

2007-10-07 Thread Jim Schneider

A. Pagaltzis wrote:

even with the current interface,
it’s possible to load a to.pm if you do it this way:

use relative to = __PACKAGE __, qw(to from before after boo);

But that’s a) noisy b) less than self-suggesting.

My I suggest this:

   use relative to = self, qw(foo bar roo);

Anything else would be used the way it is currently defined.  I like 
relative to = 'self', because it seems a bit more regular, and 
somewhat self documenting.


Re: Another non-free license - PerlBuildSystem

2007-02-21 Thread Jim Schneider

Ovid wrote:

Being an *extremely* political creature, I'm sorely tempted to wade
into this mess, but I won't.  Can we just agree to stick to the
license's suitability for the CPAN?

Cheers,
Ovid

Perhaps this is just a me, too...

The law of unintended consequences (Every action has at least two 
consequences - the one you intended, and at least one you didn't) is at 
work here.  I think it's ironic that some of the biggest organizational 
contributors to open source (Red Hat, O'Reilly Media, and CPAN come to 
mind) are barred from using PerlBuildSystem because they don't restrict 
their distributions to keep them away from armed groups (and are thus 
suppliers).


You can hold whatever political opinions you want.  Just be aware that 
when you try to mix ideology with technology, the technology invariably 
suffers.


But I'm guessing the author of PerlBuildSystem isn't subscribed to this 
mailing list, anyways.


Re: RFC: new module Finance::MortgageCalculator

2006-11-07 Thread Jim Schneider

Perhaps Finance::Calculator::Mortgage?

- Original Message - 
From: Dmitri Tikhonov [EMAIL PROTECTED]

To: Smylers [EMAIL PROTECTED]
Cc: module-authors@perl.org
Sent: Wednesday, November 01, 2006 9:32 AM
Subject: Re: RFC: new module Finance::MortgageCalculator



Mortgages may compound differently -- monthly (the most common in the
USA), biweekly, or semi-annually (Canadian).  The common thing between
them is that there's still a fixed number of payments.  I will make the
interface support all of these methods.

The following web site has some formulas and derivations:

 http://www.hughchou.org/calc/formula.html

I guess I could do Finance::MortgageCalculator and
Finance::MortgageCalculator::US (which would be an empty subclass) and
then people could inherit from Finance::MortageCalculator and create
their own country-specific calculators.

(I'd still like the word Calculator being present in the name.  When I
look at module named Finance::Mortgage I think to myself -- mortgage
*what*?)

 - Dmitri.



Re: Proposed module names - DBIx::Class::Simple

2006-03-14 Thread Jim Schneider

From: A. Pagaltzis [EMAIL PROTECTED]

DBIx::ORM::Declarative?


Perfect!  Thank you so much.



Re: Proposed module names - DBIx::Class::Simple

2006-03-13 Thread Jim Schneider
I appologize.  I wasn't terribly clear.  I was hoping for suggestions as to 
what name would be appropriate - I'm quite well aware that the names I have 
are bad.


David Landgren wrote:

David Golden wrote:

Jim Schneider wrote:

-snip-
I think you may be best if you come up with your own DBIx::* name that 
captures what you feel is distinctive about your module -- beyond it just 
being simple.


Seconded. Simple modules never are. I'm not trying to be flippant. If the 
documentation isn't equally simple (for instance, fits on a screenful with 
no additional provisos or exceptions) then the person using it spends as 
much time learning how to use it as a supposedly more complex module.


Here are the biggest difference between Class::DBI and my module:
1)  You don't subclass my module - the necessary subclasses are built on the 
fly.
2)  Collections of tables are described by data structures, and these 
structures are passed to the module at use time, as contrasted to a 
subclass calling a bunch of methods in the parent class to establish what a 
table looks like.


For example, if you have a PERSON table (with PERSON_ID, NAME, ADDRESS_ID, 
and EMAIL columns),
and an ADDRESS table (with ADDRESS_ID, LINE1, LINE2, CITY, STATE, ZIP 
columns (sorry for the USA-centric example)), your use clause would look 
something like this:


use DBIx::Class::Simple ( {
   schema = 'example',
   tables = [
   { table = 'person',
   columns = [
   { name = 'person_id' }, { name = 'name', },
   { name = 'address_id' }, { name = 'email' }, ], },
   { table = 'address',
   columns = [
   { name = 'address_id', }, { name = 'line1', },
   { name = 'line2', }, { name = 'city', },
   { name = 'state', }, { name = 'zip', }, ], }, ], }, );

(I've compressed the example a bit to save space).

Once you've used the module, you can say:

my $db = DBIx::Class::Simple-new(handle = $dbh); # $dbh is a DBI handle

And later on:

$schema = $db-example;
$ptable = $schema-person;
@people = $ptable-search(@criteria)

Any suggestions on what to call this beast would be appreciated.
-snip-
Discussion of the other modules has been moved to separate threads. 



Proposed module names

2006-03-12 Thread Jim Schneider



I have three modules I am preparing to submit to 
CPAN, and I was hoping to get some input on the names.

The modules are:
1) DBIx::Class::Simple - a simpler 
alternative to DBIx::Class, but alas, not compatible (not even a little 
bit). It takes a collection of data structures that describe your tables, 
and turns it into a collection of classes that can be used to access them. 
I'm also open to the name DBIx::Simple::Class or 
DBIx::Simple::Object

2) WWW::Scraper::Zip4 - a simple web scraper 
to retrieve address information from the USPS website.

3) Well, I'm currently calling it 
TemplateLoader, but that's too horrible for words. Your provide some 
particulars on the module "use" line, and it creates a method in the calling 
class that loads the template. Any suggestions for this one would 
definitely be appreciated.