On Monday, 8 December 2014 17:04:10 UTC, John Myles White wrote:
>
> * This package and the current DataFrames code both support specifying the 
> types of all columns before parsing begins. There's no fast path in 
> CSVReaders that uses this information to full-advantage because the 
> functions were designed to never fail -- instead they always enlarge types 
> to ensure successful parsing. It would be good to think about how the 
> library needs to be restructured to support both use cases. I believe the 
> DataFrames parser will fail if the hand-specified types are invalidated by 
> the data.
>

I agree that being permissive by default is probably a good idea, but 
sometimes it is nice if the parser throws an error if it finds something 
unexpected. This could also be useful for the "end-of-data" problem below.

* Does the CSV standard have anything like END-OF-DATA? It's a very cool 
> idea, but it seems that you'd need to introduce an arbitrary predicate that 
> occurs per-row to make things work in the absence of existing conventions.
>

Well, there isn't really a standard, just this RFC:
http://www.ietf.org/rfc/rfc4180.txt
which seems to assume end-of-data = end-of-file.
 
When I hit this problem the files I was reading weren't actually CSV, but 
this:
http://lsbr.niams.nih.gov/bsoft/bsoft_param.html
which have multiple tables per file, ended by a blank line. I think I ended 
up devising a hack that would count the number of lines beforehand.

Reply via email to