I forgot to mention the two other potential source of slowdowns.

One of them was already mentioned by John: the `mconcat` `Fold` is not optimal for `ByteString`s. The issue with the `mconcat` fold is that it is pairwise combining the `ByteString`s one at a time, so if you have N bytestrings it's equivalent to:

    b1 <> (b2 <> (b3 <> (b4 <> ....)))

... which is very inefficient (it has O(N^2) time and space complexity). The more efficient way to concatenate multiple bytestrings is:

    Strict.concat [b1, b2, b3, b4, b5]

There's a very easy solution, which is to replace `mconcat` with a more optimal version, which is:

    concat :: Fold Strict.ByteString Strict.ByteString
    concat = fmap Strict.concat list

If your lines were small with respective to the default chunk size, your code would have the effect of greatly decreasing the chunk size which would degrade performance if your output handle is not buffered because it would translate to more write system calls.

Like I mentioned before, concatenating bytestrings into lines is not really idiomatic, but if you really need to do it that is how you would improve performance.

On 8/20/2015 4:43 AM, Alexey Raga wrote:
Hi,

I have a huge file (~40M rows) in a custom format where each line represents a data type, so I want to process it line-by-line.

this code runs very fast (~20 seconds):

    import Pipes.ByteString as BS

    runEffect $ BS.stdin >-> BS.stdout


while this one runs much slower (>2 minutes to execute):

    bslines :: (MonadIO m) => Producer ByteString m ()
    bslines = purely folds mconcat . view BS.lines $ BS.stdin

    main :: IO ()
    main = runEffect $ bslines >-> BS.stdout

Why does it happen? And what would be the fastest way to consume a file line-by-line? To compare, consuming the same file in Node.js line-by-line takes ~40 seconds, how can similar results be achieved?

Regards,
Alexey.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscr...@googlegroups.com <mailto:haskell-pipes+unsubscr...@googlegroups.com>. To post to this group, send email to haskell-pipes@googlegroups.com <mailto:haskell-pipes@googlegroups.com>.

--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Reply via email to