Here you go:
import Control.Monad (void)
import Data.Char (ord)
import Data.Word (Word8)
import Control.Applicative
import qualified Data.ByteString as B
import Data.Attoparsec.ByteString
import Pipes.Attoparsec (parsed)
import Pipes.ByteString (fromHandle)
import Pipes
import qualified Pipes.Prelude as Pipes
import qualified System.IO as IO
main = IO.withFile "euler42.txt" IO.ReadMode $ \handle -> do
n <- Pipes.length $ void $ -- Ignore errors for simplicity
parsed parser (fromHandle handle) >-> Pipes.filter isTriangle
print n
where
isTriangle _ = True
parser :: Parser Word8
parser =
fmap wordValue
$ word8 quote
*> takeWhile1 (/= quote)
<* word8 quote
<* optional (word8 comma) )
where
wordValue = B.foldl' (+) 0 . B.map (subtract 64)
quote = fromIntegral (ord '"')
comma = fromIntegral (ord ',')
The trick is to only use `attoparsec` to parse a single value, so that
it never backtracks further than one word. The reason that `attoparsec`
chokes on large files is that it has to backtrack.
The way `pipes-attoparsec` is designed to work is that you use
`attoparsec` to define a backtracking parser for a small unit of input
(i.e. a line or word), but then after each successful parse you commit
and don't look back. The first argument to `parsed` is the parser for
the backtrackable single element, and `pipes-attoparsec` makes sure to
commit and not backtrack every time the parser succeeds. That's why it
runs in constant space compared to `attoparsec`.
The rest is almost identical to what you proposed. The only difference
is that I moved the `map` directly into the `Parser` logic instead of
making it a separate pipe.
On 06/12/2014 09:13 PM, Данило Глинський wrote:
There is simple problem on Project Euler
- http://projecteuler.net/problem=42 , which boils down to
import Data.Char
import Data.Word
import Control.Applicative
import qualified Data.ByteString as B
import Data.Attoparsec.ByteString
main = do
result <- solution <$> parseFile <$> B.readFile "euler42.txt"
print result
solution :: [B.ByteString] -> Int
solution =
length . filter isTriangle . fmap wordValue
where
wordValue = B.foldl' (+) 0 . B.map (subtract 64)
isTriangle = const True {- some secret function -}
parseFile :: B.ByteString -> [B.ByteString]
parseFile = either (const []) id .
parseOnly (wordParser `sepBy1` word8 (char ','))
where
wordParser = word8 (char '"') *> takeWhile1 (/= (char '"')) <*
word8 (char '"')
char = fromIntegral . ord
Now the problem is: given large input file (150 Mb) this solution
leads to OutOfMemory.
Seems like pipes are designed to solve this problem in same
compositinal way, so the question is: what is a correct way to
"pipify" this small program?
As I understand, there must be:
1) Producer of ByteString's, based on pipes-parse and pipes-attoparsec
2) Some analogues of "map", "filter" and "length" on pipe streams
3) extract result (unpipify)
The 2) seems easy. We should get something like
solution :: Producer B.ByteString IO () -> Producer Int IO ()
solution =
P.length <-< P.filter isTriangle <-< P.map wordValue
but streaming from parsing and extracting is not that obvious. Could
someone help me?
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].