Here you go:

    import Control.Monad (void)
    import Data.Char (ord)
    import Data.Word (Word8)
    import Control.Applicative
    import qualified Data.ByteString as B
    import Data.Attoparsec.ByteString
    import Pipes.Attoparsec (parsed)
    import Pipes.ByteString (fromHandle)
    import Pipes
    import qualified Pipes.Prelude as Pipes
    import qualified System.IO as IO

    main = IO.withFile "euler42.txt" IO.ReadMode $ \handle -> do
        n <- Pipes.length $ void $  -- Ignore errors for simplicity
            parsed parser (fromHandle handle) >-> Pipes.filter isTriangle
        print n
      where
        isTriangle _ = True

    parser :: Parser Word8
    parser =
            fmap wordValue
        $   word8 quote
        *>  takeWhile1 (/= quote)
        <*  word8 quote
        <*  optional (word8 comma) )
      where
        wordValue = B.foldl' (+) 0 . B.map (subtract 64)
        quote = fromIntegral (ord '"')
        comma = fromIntegral (ord ',')

The trick is to only use `attoparsec` to parse a single value, so that it never backtracks further than one word. The reason that `attoparsec` chokes on large files is that it has to backtrack.

The way `pipes-attoparsec` is designed to work is that you use `attoparsec` to define a backtracking parser for a small unit of input (i.e. a line or word), but then after each successful parse you commit and don't look back. The first argument to `parsed` is the parser for the backtrackable single element, and `pipes-attoparsec` makes sure to commit and not backtrack every time the parser succeeds. That's why it runs in constant space compared to `attoparsec`.

The rest is almost identical to what you proposed. The only difference is that I moved the `map` directly into the `Parser` logic instead of making it a separate pipe.

On 06/12/2014 09:13 PM, Данило Глинський wrote:
There is simple problem on Project Euler - http://projecteuler.net/problem=42 , which boils down to

import Data.Char
import Data.Word
import Control.Applicative
import qualified Data.ByteString as B
import Data.Attoparsec.ByteString

main = do
    result <- solution <$> parseFile <$> B.readFile "euler42.txt"
    print result
solution :: [B.ByteString] -> Int
solution =
    length . filter isTriangle . fmap wordValue
    where
        wordValue = B.foldl' (+) 0 . B.map (subtract 64)
isTriangle = const True {- some secret function -}
parseFile :: B.ByteString -> [B.ByteString]
parseFile = either (const []) id .
    parseOnly (wordParser `sepBy1` word8 (char ','))
  where
wordParser = word8 (char '"') *> takeWhile1 (/= (char '"')) <* word8 (char '"')
    char = fromIntegral . ord

Now the problem is: given large input file (150 Mb) this solution leads to OutOfMemory.

Seems like pipes are designed to solve this problem in same compositinal way, so the question is: what is a correct way to "pipify" this small program?

As I understand, there must be:
1) Producer of ByteString's, based on pipes-parse and pipes-attoparsec
2) Some analogues of "map", "filter" and "length" on pipe streams
3) extract result (unpipify)

The 2) seems easy. We should get something like

solution :: Producer B.ByteString IO () -> Producer Int IO ()
solution =
    P.length <-< P.filter isTriangle <-< P.map wordValue

but streaming from parsing and extracting is not that obvious. Could someone help me?
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To post to this group, send email to [email protected] <mailto:[email protected]>.

--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to