Hi Doug,

Doug Coleman wrote:
 > Hi Phil,
 >
 > The CSV parser is missing some features that would make it more
 > usable.  Also, the code no longer looks idiomatic to me -- maybe a
 > rewrite is in order?
 >

Cool. Which parts aren't idiomatic? I've recently had a second baby so 
haven't been keeping up with factor developments as well as I could be.

 > * Take a look at Python's CSV API 
http://www.python.org/dev/peps/pep-0305/
 >   where they define dialects to be a set of
 >
 > TUPLE: dialect delimiter quote-char escape-char double-quote skip-
 > initial-space
 > line-terminator quoting ;
 >
 > The current CSV parser only allows the delimiter to be changed.
 >

Adding these features sounds worthwhile, I'll see what I can do.
I wonder if this can be done without hurting performance?

 > * There are multiline fields delimited by the quot-char and I'm not
 > sure we support them:
 >
 > 3,"a long
 > string", more,data
 >

AFAICS this works fine, and there's a test for it. E.g.

( scratchpad ) "3,\"a long\nstring\", more,data" <string-reader> csv .

{ { "3" "a long\nstring" "more" "data" } }

 > * It could autodetect the separator or detect if there is a column
 > header on the first line like Python's Sniffer 
http://docs.python.org/library/csv.html#csv.Sniffer
 >    You might return a tuple with the header line and rows.
 >

That's cool - I'll check it out.

 > * For huge CSV files you might want to process it line by line.
 >

You can do this with row (but I notice this has been made private)

 > The most useful word for me is ``file>csv'' because I'm always reading
 > from a file.

Cool, although looks like file>csv leaks a descriptor (or am I missing 
something?)

 > One approach is to make  a word ``stream>csv'' that is
 > used to implement ``file>csv'', ``string>csv'', etc.  If you want to
 > make another word that reads from the standard input, that's fine --
 > what's your use case for it, a header line that you want to read
 > before reading the rest of the lines?

Yes that's right, over http. Personally I'd like 'csv' to take no 
arguments and instead read the text from input-stream. I'm finding all 
of my uses of this word call it wrapped in a [ ] with-blah combinator.

 > Maybe some of the features listed above are overdesign, but as there's
 > no CSV standard, everybody does something slightly different and not
 > supporting a critical feature would make the whole library unusable
 > for a particular file.

Agreed. I'll make a start on this but can't promise quick results 
because most of my spare time is spent juggling children at the mo!

Thanks,

Phil



------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to