Hi Doug, Doug Coleman wrote: > Hi Phil, > > The CSV parser is missing some features that would make it more > usable. Also, the code no longer looks idiomatic to me -- maybe a > rewrite is in order? >
Cool. Which parts aren't idiomatic? I've recently had a second baby so haven't been keeping up with factor developments as well as I could be. > * Take a look at Python's CSV API http://www.python.org/dev/peps/pep-0305/ > where they define dialects to be a set of > > TUPLE: dialect delimiter quote-char escape-char double-quote skip- > initial-space > line-terminator quoting ; > > The current CSV parser only allows the delimiter to be changed. > Adding these features sounds worthwhile, I'll see what I can do. I wonder if this can be done without hurting performance? > * There are multiline fields delimited by the quot-char and I'm not > sure we support them: > > 3,"a long > string", more,data > AFAICS this works fine, and there's a test for it. E.g. ( scratchpad ) "3,\"a long\nstring\", more,data" <string-reader> csv . { { "3" "a long\nstring" "more" "data" } } > * It could autodetect the separator or detect if there is a column > header on the first line like Python's Sniffer http://docs.python.org/library/csv.html#csv.Sniffer > You might return a tuple with the header line and rows. > That's cool - I'll check it out. > * For huge CSV files you might want to process it line by line. > You can do this with row (but I notice this has been made private) > The most useful word for me is ``file>csv'' because I'm always reading > from a file. Cool, although looks like file>csv leaks a descriptor (or am I missing something?) > One approach is to make a word ``stream>csv'' that is > used to implement ``file>csv'', ``string>csv'', etc. If you want to > make another word that reads from the standard input, that's fine -- > what's your use case for it, a header line that you want to read > before reading the rest of the lines? Yes that's right, over http. Personally I'd like 'csv' to take no arguments and instead read the text from input-stream. I'm finding all of my uses of this word call it wrapped in a [ ] with-blah combinator. > Maybe some of the features listed above are overdesign, but as there's > no CSV standard, everybody does something slightly different and not > supporting a critical feature would make the whole library unusable > for a particular file. Agreed. I'll make a start on this but can't promise quick results because most of my spare time is spent juggling children at the mo! Thanks, Phil ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk