On Thu, Jan 21, 2016 at 04:31:03PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: > On Thu, Jan 21, 2016 at 04:26:16PM -0800, H. S. Teoh via Digitalmars-d-learn > wrote: [...] > > https://github.com/quickfur/fastcsv > > Oh, forgot to mention, the parsing times are still lightning fast > after the fixes I mentioned: still around 1190 msecs or so. > > Now I'm tempted to actually implement doubled-quote interpretation... > as long as the input file doesn't contain unreasonable amounts of > doubled quotes, I'm expecting the speed should remain pretty fast. [...]
Done, commits pushed to github. The new code now parses doubled quotes correctly. The performance is slightly worse now, around 1300 msecs on average, even in files that don't have any doubled quotes (it's a penalty incurred by the inner loop needing to detect doubled quote sequences). My benchmark input file doesn't have any doubled quotes, however (code correctness with doubled quotes is gauged by unittests only); so the performance numbers may not accurately reflect true performance in the general case. (But if doubled quotes are rare, as I'm expecting, the actual performance shouldn't change too much in general usage...) Maybe somebody who has a file with lots of ""'s can run the benchmark to see how badly it performs? :-P T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.