Recently I've been playing around with Parsec for a simple parsing project. While I was able to quickly construct my grammar (simplified version attached), getting it working has been a bit tricky. In particular, I am now stuck trying to figure out why Parsec is mis-reporting line numbers. Parsec seems convinced that line 12 of my input (also attached) has a "%" character,
$ runghc Test.hs
Left "(unknown)" (line 12, column 1):
unexpected "%"
expecting space or atom name
while my file clearly disagrees,
10 %FLAG ATOM_NAME
11 %FORMAT(20a4)
12 C1 H1 C2 H2 C3 H3 C4 H4 C5 C6 C7 C8 N1 C9 H9 C10 H10 C11
H11 C12
13 H12 C13 H13 C14 C15 N2 C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22
H221H222H223
...
18 %FLAG CHARGE
19 %FORMAT(5E16.8)
The task here is to identify the block of data lines (lines 12-17),
ending at the beginning of the next block (starting with "%"). It seems
likely that my problem stems from the fact that I use "try" to
accomplish this but this is as far as I can reason.
Any ideas what might cause this sort of off-by-one? Does anyone see a
better (i.e. working) way to formulate my grammar? Any and all help
would be greatly appreciated. Thanks.
Cheers,
- Ben
module Main(main) where
import Debug.Trace
import Data.Maybe
import Text.ParserCombinators.Parsec
import Text.Parsec.Prim (ParsecT)
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (emptyDef)
data PrmTopBlock = AtomNames [String]
deriving (Show, Eq)
countBetween m n p = do xs <- count m p
ys <- count (n - m) $ option Nothing $ do
y <- p
return (Just y)
return (xs ++ catMaybes ys)
restLine = do a <- many (noneOf "\n")
eol
return a
eol = do skipMany $ char ' '
char '\n'
natural = P.integer $ P.makeTokenParser emptyDef
float = do sign <- option 1 (do s <- oneOf "+- "
return $ if s == '-' then -1 else 1)
x <- P.float $ P.makeTokenParser emptyDef
return $ sign * x
flagDecl x = do string $ "%FLAG " ++ x
eol
formatDecl = do string "%FORMAT("
count <- many1 digit
format <- letter
length <- many1 digit
char ')'
eol
return (count, format, length)
-- |Multiple lines of lists of a given item
linesOf item = do ls <- many1 $ try (do lookAhead (noneOf "%")
l <- many1 item
eol
return $ trace (show l) l)
return $ concat ls
atomNameBlock = do flagDecl "ATOM_NAME"
formatDecl
atomNames <- linesOf atomName
return $ AtomNames atomNames
where
atomName = do spaces
name <- countBetween 1 4 (alphaNum <|> oneOf "\'+-") <?> "atom name"
return name
ignoredBlock = do string "%FLAG" <?> "ignored block flag"
restLine
formatDecl
skipMany (noneOf "%" >> restLine)
hello = do ignoredBlock
ignoredBlock
atomNameBlock
parsePrmTopFile input = parse hello "(unknown)" input
test = do a <- readFile "test.prmtop"
print $ parsePrmTopFile a
main = test
test.prmtop
Description: Binary data
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
