Re: [PATCH master 2/3] Add Haskell parser for "xm list --long"

Iustin Pop Mon, 18 Feb 2013 04:10:35 -0800

On Mon, Feb 18, 2013 at 12:57:36PM +0100, Michele Tartara wrote:
> On Mon, Feb 18, 2013 at 12:29 PM, Iustin Pop <[email protected]> wrote:
> 
> > On Mon, Feb 18, 2013 at 11:49:03AM +0100, Iustin Pop wrote:
> > > On Mon, Feb 18, 2013 at 10:01:29AM +0100, Michele Tartara wrote:
> > > > On Mon, Feb 18, 2013 at 9:54 AM, Iustin Pop <[email protected]> wrote:
> > > >
> > > > > On Fri, Feb 15, 2013 at 09:24:40PM +0100, Michele Tartara wrote:
> > > > > > On Fri, Feb 15, 2013 at 10:33 AM, Iustin Pop <[email protected]>
> > wrote:
> > > > > >
> > > > > > > On Fri, Feb 15, 2013 at 10:21:46AM +0100, Michele Tartara wrote:
> > > > > > > > On Thu, Feb 14, 2013 at 2:53 PM, Iustin Pop <[email protected]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > On Wed, Feb 06, 2013 at 01:09:51PM +0100, Michele Tartara
> > wrote:
> > > > > > > > > > In order to fetch precise information about the status of
> > the VMs
> > > > > > > > > running in
> > > > > > > > > > Xen, we need to analyze the output of the "xm list --long"
> > > > > command.
> > > > > > > > > >
> > > > > > > > > > This commit adds the parser to do that, and its tests.
> > > > > > >
> > > > > > > OK, thanks for all the acks, LGTM on submitting it with them.
> > > > > > >
> > > > > > > thanks,
> > > > > > > iustin
> > > > > > >
> > > > > >
> > > > > >
> > > > > > While switching from my genNonEmptyString to genName, I found out
> > a bug
> > > > > in
> > > > > > the parser.
> > > > > > If a string like "9a" was received, it would have been parsed as
> > LCDouble
> > > > > > 9, instead of LCString 9a.
> > > > > >
> > > > > > In order to fix the bug, I needed to use a lookahead of 1
> > character to
> > > > > > determine if after the number there was an alphabetic caracter or
> > the
> > > > > > actual end of the number.
> > > > > >
> > > > > > This required some non-trivial modifications (especially because
> > > > > > Data.Attoparsec.Text does not support lookahead, whereas
> > > > > > Data.Attoparsec.ByteString.Char8 does).
> > > > >
> > > > > Hmm. Can't you fix this without lookahead, by requiring the
> > "leftover"
> > > > > string to be empty? IIRC you use that model already in the DRBD
> > parser...
> > > > >
> > > > >
> > > > Yes, but that is just the most simple case. The exact same problem
> > happens
> > > > in a configuration like this:
> > > > (name 9a (100 foo))
> > > > or in a string like this:
> > > > (field 40rg)
> > > >
> > > > Not just in the trivial string-only case. A number is actually a number
> > > > only if it is followed by a whitespace or a ')'.
> > > > The end of input is just a very specific case.
> > >
> > > Of course, if you mean end of entire string. I was referring only to
> > > things like (field 40org), where you have a 'virtual' end-of-string
> > > after 40org. But if you have to support (name 9a (...)), then it won't
> > > work, indeed.
> > >
> > > > There is probably a way to check this without lookahead, but I haven't
> > > > found it yet (and anyway I expect it likely to make the parser quite
> > > > unreadable).
> > >
> > > So something has to do the lookahead in any case, due to the input
> > > format. I was just thinking if there is a way to push the actual
> > > lookahead to Attoparsec, instead of you doing it manually. Something
> > > like:
> > >
> > >   manyTill digit (try (eitherP (string ")") (string " "))))
> > >
> > > Unfortunately this will eat the end of the string, which maybe is not
> > > what you want.
> > >
> > > But after our offline conversations, I realised something else: the
> > > problem is, I think, your manyTill, which allows multiple
> > > listConfigParsers near each other. I think the proper way to fix this
> > > (without explicit lookahead) would be to require them to be separated by
> > > whitespace, e.g. something along the lines:
> > >
> > >   where listConfigP =
> > >     (A.char '(' *> AC.sepBy1 lispConfigParser whiteSpaceP <* A.char ')')
> > >     >>= pure . LCList
> > >
> > > However, this doesn't work, exactly because of the '123a' being
> > > misparsed. Grrr :)
> >
> > OK, I think I have it. The following code does the right thing, as far
> > as I can see. I'm not sure though if it's very clean (the liftA2 part):
> >
> >     where listConfigP = LCList <$> (A.char '(' *> liftA2 (\h t -> h ++
> > [t]) (many nwsP) bsP)
> >           doubleP = LCDouble <$> A.double
> >           stringP = LCString . unpack <$> A.takeWhile1 (notInClass " ()")
> >           wspace = AC.many1 $ A.char ' '
> >           rparen = A.char ')'
> >           bsP = listConfigP <* rparen <|> doubleP <* rparen <|> stringP <*
> > rparen
> >           nwsP = listConfigP <* wspace <|> doubleP <* wspace <|> stringP
> > <* wspace
> >
> > The trick is to enforce takeWhile1 (used both in stringP and indirectly
> > in doubleP via A.double) to fail when not followed by the correct char.
> > Hence the push of <* rparen/<* wspace inside the alternatives, as
> > opposed to outside. And the explicit ending with bsP which eats the ')'
> > itself, as opposed to pushing it outside.
> >
> > λ> parseWith (Just Data.Text.empty) lispConfigParser "(123 12 a bc123 1a
> > 1b)"
> > Just Done "" LCList [LCDouble 123.0,LCDouble 12.0,LCString "a",
> >                      LCString "bc123",LCString "1a",LCString "1b"]
> >
> > Not sure if this is clean enough. What do you think?
> >
> 
> I still have to test it, but it seems to work.
> 
> It should always be possible to implement a solution without lookahead, and
> I had thought about something similar to this one, but I had decided not to
> even try to implement it because, even if it is well implemented and quite
> elegant, it seems much less readable with respect to using the lookAhead.
> I can use this, no problem, but what is the issue with using lookahead?


There's no problem (technical) with using lookahead. My personal issue
with it is that (at least your code) using lookahead is way too
imperative: get char, check if char is space, push back, etc.

Here I was trying to find out a purely declarative solution: this is how
the input should look like.

> And, as a side note, is there a particular reason why you defined wspace
> instead of using skipSpace?

Yes, definitely. If you ask this, I think you didn't "get" this solution
yet :)

'skipSpace' always succeeds. What we want is the opposite: a combinator
that fails if there's no white space following, so as to cause the other
combinations to fail as well.

Probably my wspace definition can be improved (e.g. many1 space), but it
has to fail on non-space input.

In the end, the decision on whether to go via lookahead or via
declarative approach is one of style. It doesn't really matter which, as
I said I find the latter one more readable, especially if we abstract
the liftA2 part:

appendP :: Parser [a] -> Parser a -> Parser [a]
appendP = liftA2 (\h, t -> h ++ [t])

or something like that.

thanks!
iustin

Re: [PATCH master 2/3] Add Haskell parser for "xm list --long"

Reply via email to