On Mon, Feb 18, 2013 at 11:49:03AM +0100, Iustin Pop wrote:
> On Mon, Feb 18, 2013 at 10:01:29AM +0100, Michele Tartara wrote:
> > On Mon, Feb 18, 2013 at 9:54 AM, Iustin Pop <[email protected]> wrote:
> > 
> > > On Fri, Feb 15, 2013 at 09:24:40PM +0100, Michele Tartara wrote:
> > > > On Fri, Feb 15, 2013 at 10:33 AM, Iustin Pop <[email protected]> wrote:
> > > >
> > > > > On Fri, Feb 15, 2013 at 10:21:46AM +0100, Michele Tartara wrote:
> > > > > > On Thu, Feb 14, 2013 at 2:53 PM, Iustin Pop <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > On Wed, Feb 06, 2013 at 01:09:51PM +0100, Michele Tartara wrote:
> > > > > > > > In order to fetch precise information about the status of the 
> > > > > > > > VMs
> > > > > > > running in
> > > > > > > > Xen, we need to analyze the output of the "xm list --long"
> > > command.
> > > > > > > >
> > > > > > > > This commit adds the parser to do that, and its tests.
> > > > >
> > > > > OK, thanks for all the acks, LGTM on submitting it with them.
> > > > >
> > > > > thanks,
> > > > > iustin
> > > > >
> > > >
> > > >
> > > > While switching from my genNonEmptyString to genName, I found out a bug
> > > in
> > > > the parser.
> > > > If a string like "9a" was received, it would have been parsed as 
> > > > LCDouble
> > > > 9, instead of LCString 9a.
> > > >
> > > > In order to fix the bug, I needed to use a lookahead of 1 character to
> > > > determine if after the number there was an alphabetic caracter or the
> > > > actual end of the number.
> > > >
> > > > This required some non-trivial modifications (especially because
> > > > Data.Attoparsec.Text does not support lookahead, whereas
> > > > Data.Attoparsec.ByteString.Char8 does).
> > >
> > > Hmm. Can't you fix this without lookahead, by requiring the "leftover"
> > > string to be empty? IIRC you use that model already in the DRBD parser…
> > >
> > >
> > Yes, but that is just the most simple case. The exact same problem happens
> > in a configuration like this:
> > (name 9a (100 foo))
> > or in a string like this:
> > (field 40rg)
> > 
> > Not just in the trivial string-only case. A number is actually a number
> > only if it is followed by a whitespace or a ')'.
> > The end of input is just a very specific case.
> 
> Of course, if you mean end of entire string. I was referring only to
> things like (field 40org), where you have a 'virtual' end-of-string
> after 40org. But if you have to support (name 9a (…)), then it won't
> work, indeed.
> 
> > There is probably a way to check this without lookahead, but I haven't
> > found it yet (and anyway I expect it likely to make the parser quite
> > unreadable).
> 
> So something has to do the lookahead in any case, due to the input
> format. I was just thinking if there is a way to push the actual
> lookahead to Attoparsec, instead of you doing it manually. Something
> like:
> 
>   manyTill digit (try (eitherP (string ")") (string " "))))
> 
> Unfortunately this will eat the end of the string, which maybe is not
> what you want.
> 
> But after our offline conversations, I realised something else: the
> problem is, I think, your manyTill, which allows multiple
> listConfigParsers near each other. I think the proper way to fix this
> (without explicit lookahead) would be to require them to be separated by
> whitespace, e.g. something along the lines:
> 
>   where listConfigP =
>     (A.char '(' *> AC.sepBy1 lispConfigParser whiteSpaceP <* A.char ')')
>     >>= pure . LCList
> 
> However, this doesn't work, exactly because of the '123a' being
> misparsed. Grrr :)

OK, I think I have it. The following code does the right thing, as far
as I can see. I'm not sure though if it's very clean (the liftA2 part):

    where listConfigP = LCList <$> (A.char '(' *> liftA2 (\h t -> h ++ [t]) 
(many nwsP) bsP)
          doubleP = LCDouble <$> A.double
          stringP = LCString . unpack <$> A.takeWhile1 (notInClass " ()")
          wspace = AC.many1 $ A.char ' '
          rparen = A.char ')'
          bsP = listConfigP <* rparen <|> doubleP <* rparen <|> stringP <* 
rparen
          nwsP = listConfigP <* wspace <|> doubleP <* wspace <|> stringP <* 
wspace

The trick is to enforce takeWhile1 (used both in stringP and indirectly
in doubleP via A.double) to fail when not followed by the correct char.
Hence the push of <* rparen/<* wspace inside the alternatives, as
opposed to outside. And the explicit ending with bsP which eats the ')'
itself, as opposed to pushing it outside.

λ> parseWith (Just Data.Text.empty) lispConfigParser "(123 12 a bc123 1a 1b)"
Just Done "" LCList [LCDouble 123.0,LCDouble 12.0,LCString "a",
                     LCString "bc123",LCString "1a",LCString "1b"]

Not sure if this is clean enough. What do you think?

thanks,
iustin

Reply via email to