Don Stewart ha scritto:
manlio_perillo:
[...]
It is possible to implement a map reduce version that can handle gzipped
log files?
Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.
This is a bit hard.
A deflate encoded stream contains multiple blocks, so you need to find
the offset of each block and decompress it in parallel.
But then you need also to make sure each final block terminates with a '\n'.
And the zlib Haskell binding does not support this usage (I'm not even
sure zlib support this).
By the way, this phrase:
"We allow multiple threads to read different chunks at once by supplying
each one with a distinct file handle, all reading the same file"
here:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193
IMHO is not correct, or at least misleading.
Each block is read in the main thread, or at least myThreadId return
always the same value.
This is also the reason why I don't understand why my version is slower
then the book version.
The only difference is that the book version reads 4 chunks and my
version only 1 big chunk.
-- Don
Thanks Manlio
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe