On slightly closer inspection of that spec, it seems that backslash quoting 
only happens in the `##` comment sections ignored above. So, maybe they aren't 
buggy after all if that data is not important to the calculation in question.

A further point along the lines of "if we're going to provide `split`, maybe we 
should provide faster variants" is that the times when it is going to matter 
most will be for huge inputs where the columns may well have a more regular 
nature like numbers and specifically forbid more complex lexical structure. 
There it might A) be correct, B) performance might matter a lot in human terms, 
and C) the programmer might mostly be non-sophisticated with regard to even 
terminology like "lexing" or "vectorized memchr". This all seems to be the case 
with this VCF thing, but I feel like it's come up quite a few times over the 
years. It may not **always** be "bugs running faster". :-)

Anyway, I don't think "to force people to learn new terminology/techniques" is 
a very welcoming answer. So, I tried to provide something more welcoming. Even 
if their parsing is sloppy & error prone, I think naive programmers facing 
consequent errors on their own data sets rather than complaining about Nim 
library performance is better optics for Nim.

All that said, I think we agree 100% that we probably need more information 
from @markebbert to help him any more with his actual problem. Maybe it is IO. 
Maybe he didn't even compile with `-d:danger`. If he's on Linux, I would 
suggest him decompressing first and trying my `mmap` versions. 90 GB/(100 MB/s) 
=~ 900 seconds =~ 15 minutes. Heck, some people even have 90GB of RAM. :-)

Reply via email to