I forgot to mention the two other potential source of slowdowns.
One of them was already mentioned by John: the `mconcat` `Fold` is not
optimal for `ByteString`s. The issue with the `mconcat` fold is that it
is pairwise combining the `ByteString`s one at a time, so if you have N
bytestrings it's equivalent to:
b1 <> (b2 <> (b3 <> (b4 <> ....)))
... which is very inefficient (it has O(N^2) time and space
complexity). The more efficient way to concatenate multiple bytestrings is:
Strict.concat [b1, b2, b3, b4, b5]
There's a very easy solution, which is to replace `mconcat` with a more
optimal version, which is:
concat :: Fold Strict.ByteString Strict.ByteString
concat = fmap Strict.concat list
If your lines were small with respective to the default chunk size, your
code would have the effect of greatly decreasing the chunk size which
would degrade performance if your output handle is not buffered because
it would translate to more write system calls.
Like I mentioned before, concatenating bytestrings into lines is not
really idiomatic, but if you really need to do it that is how you would
improve performance.
On 8/20/2015 4:43 AM, Alexey Raga wrote:
Hi,
I have a huge file (~40M rows) in a custom format where each line
represents a data type, so I want to process it line-by-line.
this code runs very fast (~20 seconds):
import Pipes.ByteString as BS
runEffect $ BS.stdin >-> BS.stdout
while this one runs much slower (>2 minutes to execute):
bslines :: (MonadIO m) => Producer ByteString m ()
bslines = purely folds mconcat . view BS.lines $ BS.stdin
main :: IO ()
main = runEffect $ bslines >-> BS.stdout
Why does it happen? And what would be the fastest way to consume a
file line-by-line?
To compare, consuming the same file in Node.js line-by-line takes ~40
seconds, how can similar results be achieved?
Regards,
Alexey.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to haskell-pipes+unsubscr...@googlegroups.com
<mailto:haskell-pipes+unsubscr...@googlegroups.com>.
To post to this group, send email to haskell-pipes@googlegroups.com
<mailto:haskell-pipes@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.