Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Attoparsec and multiline comments (Daniel Fischer)
   2. Re:  Attoparsec and multiline comments (Yitzchak Gale)
   3. Re:  Attoparsec and multiline comments
      (Alexander.Vladislav.Popov )


----------------------------------------------------------------------

Message: 1
Date: Thu, 15 Sep 2011 10:04:19 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Attoparsec and multiline comments
To: [email protected]
Message-ID: <[email protected]>
Content-Type: Text/Plain;  charset="utf-8"

On Thursday 15 September 2011, 06:03:44, Alexander.Vladislav.Popov wrote:
> Dear haskellers,
> 
> Help me to parse multiline comments (like /* ... */ or even like /*- ...
> -*/) using attoparsec.

The docs for manyTill at 
http://hackage.haskell.org/packages/archive/attoparsec/0.9.1.2/doc/html/Data-
Attoparsec-Combinator.html
have the example

simpleComment   = string "<!--" *> manyTill anyChar (try (string "-->"))

modifying the comment delimiters is trivial, modifying it to parse nested 
comments would be nontrivial, but still not too difficult, I think.



------------------------------

Message: 2
Date: Thu, 15 Sep 2011 12:38:08 +0300
From: Yitzchak Gale <[email protected]>
Subject: Re: [Haskell-beginners] Attoparsec and multiline comments
To: Daniel Fischer <[email protected]>
Cc: [email protected]
Message-ID:
        <caorualz--rx81ffu6uh_x5y5cgcpo_utzhpm2rjgzzunqko...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Alexander.Vladislav.Popov wrote:
>> Help me to parse multiline comments (like /* ... */ or even like /*- ...
>> -*/) using attoparsec.

Daniel Fischer wrote:
> The docs for manyTill at
> http://hackage.haskell.org/packages/archive/attoparsec/0.9.1.2/doc/html/Data-
> Attoparsec-Combinator.html
> have the example
>
> simpleComment ? = string "<!--" *> manyTill anyChar (try (string "-->"))

Unfortunately, many of the examples in the documentation for Attoparsec
are just lifted directly from the Parsec documentation, so they
need to be modified somewhat to work in Attoparsed.
This is one of those cases.

First of all, the string function in Attoparsec has type

string :: ByteString -> Parser ByteString

So you can't use a String as its argument directly; you need to
wrap the String with Data.ByteString.Char8.pack.

If you are parsing source code text, you anyway would be better
off using Text instead of String and attoparsec-text instead of
attoparsec. That, together with the OverloadedStrings extension,
will solve that problem and more. Then you don't even need
the "string" function. Just use string literals to parse strings,
and the compiler will automatically insert calls to "string".

The next issue is that you don't need "try" - unlike in Parsec,
the string combinator in Attoparsec does *not* consume any
input when it fails.

Finally, if you are bothering to use attoparsec, it is presumably
because you want the super speed it is able to achieve using
fusion. But the documentation points out that you don't get that
fusion for combinators like manyTill, only for "byte-oriented"
combinators like takeTill. So you should factor the parser to
parse the comment (quickly!) into chunks beginning with
'*', then use manyTill only on the chunks.

Here is an untested example with the above modifications:

comment = "/*" .*> (emptyComment <|> T.concat <$> commentChunks)
emptyComment = "*/" .*> pure T.empty
commentChunks = manyTill (takeWhile1 (/= '*')) (string "*/")

I did need to use the string keyword with manyTill, because
manyTill is too polymorphic for the compiler to be able to
deduce the type of the string literal. Since manyTill is so
common, I often define a type-specialized version of it
for convenience:

manyTillS :: Parser a -> Parser Text -> Parser [a]
manyTillS = manyTill

That is analogous to ".*>", attoparsec-text's type-specialized
version of  "*>" from Control.Applicative.

Then you can write:

commentChunks = takeWhile1 (/= '*') `manyTillS` "*/"

Regards,
Yitz



------------------------------

Message: 3
Date: Thu, 15 Sep 2011 15:48:39 +0600
From: "Alexander.Vladislav.Popov "
        <[email protected]>
Subject: Re: [Haskell-beginners] Attoparsec and multiline comments
To: [email protected]
Message-ID:
        <CALpbQ9YrKX=3ximpzw+sqrvma4b8py8+psbwqad+mnxfa3j...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Very, very interesting. Thank you, Yitzchak. I'll try.

2011/9/15 Yitzchak Gale <[email protected]>

> Alexander.Vladislav.Popov wrote:
> >> Help me to parse multiline comments (like /* ... */ or even like /*- ...
> >> -*/) using attoparsec.
>
> Daniel Fischer wrote:
> > The docs for manyTill at
> >
> http://hackage.haskell.org/packages/archive/attoparsec/0.9.1.2/doc/html/Data-
> > Attoparsec-Combinator.html
> > have the example
> >
> > simpleComment   = string "<!--" *> manyTill anyChar (try (string "-->"))
>
> Unfortunately, many of the examples in the documentation for Attoparsec
> are just lifted directly from the Parsec documentation, so they
> need to be modified somewhat to work in Attoparsed.
> This is one of those cases.
>
> First of all, the string function in Attoparsec has type
>
> string :: ByteString -> Parser ByteString
>
> So you can't use a String as its argument directly; you need to
> wrap the String with Data.ByteString.Char8.pack.
>
> If you are parsing source code text, you anyway would be better
> off using Text instead of String and attoparsec-text instead of
> attoparsec. That, together with the OverloadedStrings extension,
> will solve that problem and more. Then you don't even need
> the "string" function. Just use string literals to parse strings,
> and the compiler will automatically insert calls to "string".
>
> The next issue is that you don't need "try" - unlike in Parsec,
> the string combinator in Attoparsec does *not* consume any
> input when it fails.
>
> Finally, if you are bothering to use attoparsec, it is presumably
> because you want the super speed it is able to achieve using
> fusion. But the documentation points out that you don't get that
> fusion for combinators like manyTill, only for "byte-oriented"
> combinators like takeTill. So you should factor the parser to
> parse the comment (quickly!) into chunks beginning with
> '*', then use manyTill only on the chunks.
>
> Here is an untested example with the above modifications:
>
> comment = "/*" .*> (emptyComment <|> T.concat <$> commentChunks)
> emptyComment = "*/" .*> pure T.empty
> commentChunks = manyTill (takeWhile1 (/= '*')) (string "*/")
>
> I did need to use the string keyword with manyTill, because
> manyTill is too polymorphic for the compiler to be able to
> deduce the type of the string literal. Since manyTill is so
> common, I often define a type-specialized version of it
> for convenience:
>
> manyTillS :: Parser a -> Parser Text -> Parser [a]
> manyTillS = manyTill
>
> That is analogous to ".*>", attoparsec-text's type-specialized
> version of  "*>" from Control.Applicative.
>
> Then you can write:
>
> commentChunks = takeWhile1 (/= '*') `manyTillS` "*/"
>
> Regards,
> Yitz
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20110915/028e2507/attachment-0001.htm>

------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 39, Issue 21
*****************************************

Reply via email to