On Thursday, 21 January 2016 at 21:24:49 UTC, H. S. Teoh wrote:
[snip]
There are some limitations to this approach: while the current code does try to unwrap quoted values in the CSV, it does not correctly parse escaped double quotes ("") in the fields. This is because to process those values correctly we'd have to copy the field data into a new string and construct its interpreted value, which is slow. So I leave it as an exercise for the reader to implement (it's not hard, when the double double-quote sequence is detected, allocate a new string with the interpreted data instead of slicing the original data. Either that, or just unescape the quotes in the application code itself).

What about wrapping the slices in a range-like interface that would unescape the quotes on demand? You could even set a flag on it during the initial pass to say the field has double quotes that need to be escaped so it doesn't need to take a per-pop performance hit checking for double quotes (that's probably a pretty minor boost, if any, though).

Reply via email to