It seems that when I read gzipped data, I have "something more" in the stream
than when the input is first decompressed outside of nim. I noticed this
because I found data corresponding to an extra empty `Fastq` in my output when
dealing with gzipped input. I tried to confirm this by inserting assertions:
proc fastqParser(stream: Stream): iterator(): Fastq =
result = iterator(): Fastq =
var
nameLine: string
nucLine: string
quaLine: string
while not stream.atEnd():
# while not input.endOfFile:
nameLine = stream.readLine()
#TODO: Why is there an extra empty Fastq when reading gzipped input?
doAssert(not stream.atEnd(), "stream ended after nameLine: " &
nameLine)
nucLine = stream.readLine()
doAssert(not stream.atEnd(), "stream ended after nucLine: " & nucLine)
discard stream.readLine()
doAssert(not stream.atEnd(), "stream ended after quaLine: " & quaLine)
quaLine = stream.readLine()
yield makeFastq(nameLine, nucLine, quaLine)
This results in `Error: unhandled exception: not atEnd(stream) stream ended
after nameLine: [AssertionError]`, both when reading from stdin and when
reading from a command-line given file.
The python version doesn't generate one extra empty `Fastq`, but maybe the
gzipped data is still responsible for the problem, and the gzip reading
capability of python is more robust.