Recently I've been playing around with Parsec for a simple parsing
project. While I was able to quickly construct my grammar (simplified
version attached), getting it working has been a bit tricky. In
particular, I am now stuck trying to figure out why Parsec is
mis-reporting line numbers. Parsec seems convinced that line 12 of my
input (also attached) has a "%" character,

  $ runghc Test.hs
  Left "(unknown)" (line 12, column 1):
  unexpected "%"
  expecting space or atom name

while my file clearly disagrees,

  10  %FLAG ATOM_NAME                                                           
      
  11  %FORMAT(20a4)                                                             
      
  12  C1  H1  C2  H2  C3  H3  C4  H4  C5  C6  C7  C8  N1  C9  H9  C10 H10 C11 
H11 C12 
  13  H12 C13 H13 C14 C15 N2  C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22 
H221H222H223
  ...
  18  %FLAG CHARGE
  19  %FORMAT(5E16.8)                                                           
      

The task here is to identify the block of data lines (lines 12-17),
ending at the beginning of the next block (starting with "%"). It seems
likely that my problem stems from the fact that I use "try" to
accomplish this but this is as far as I can reason.

Any ideas what might cause this sort of off-by-one? Does anyone see a
better (i.e. working) way to formulate my grammar? Any and all help
would be greatly appreciated. Thanks.

Cheers,

- Ben


module Main(main) where

import Debug.Trace
import Data.Maybe
import Text.ParserCombinators.Parsec
import Text.Parsec.Prim (ParsecT)
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (emptyDef)

data PrmTopBlock = AtomNames [String]
                 deriving (Show, Eq)

countBetween m n p = do xs <- count m p
                        ys <- count (n - m) $ option Nothing $ do
                                y <- p
                                return (Just y)
                        return (xs ++ catMaybes ys)

restLine = do a <- many (noneOf "\n")
              eol
              return a

eol = do skipMany $ char ' '
         char '\n'

natural = P.integer $ P.makeTokenParser emptyDef

float = do sign <- option 1 (do s <- oneOf "+- "
                                return $ if s == '-' then -1 else 1)
           x <- P.float $ P.makeTokenParser emptyDef
           return $ sign * x

flagDecl x = do string $ "%FLAG " ++ x
                eol

formatDecl = do string "%FORMAT("
                count <- many1 digit
                format <- letter
                length <- many1 digit
                char ')'
                eol
                return (count, format, length)

-- |Multiple lines of lists of a given item
linesOf item = do ls <- many1 $ try (do lookAhead (noneOf "%")
                                        l <- many1 item
                                        eol
                                        return $ trace (show l) l)
                  return $ concat ls

atomNameBlock = do flagDecl "ATOM_NAME"
                   formatDecl
                   atomNames <- linesOf atomName
                   return $ AtomNames atomNames
                where
                atomName = do spaces
                              name <- countBetween 1 4 (alphaNum <|> oneOf "\'+-") <?> "atom name"
                              return name

ignoredBlock = do string "%FLAG" <?> "ignored block flag"
                  restLine
                  formatDecl
                  skipMany (noneOf "%" >> restLine)

hello = do ignoredBlock
           ignoredBlock
           atomNameBlock

parsePrmTopFile input = parse hello "(unknown)" input

test = do a <- readFile "test.prmtop"
          print $ parsePrmTopFile a

main = test

Attachment: test.prmtop
Description: Binary data

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to