Re: [antlr-dev] Haskell, LISP, other weird stuff

Mark Wright Thu, 27 Jan 2011 05:56:43 -0800

On Sat, 22 Jan 2011 11:13:36 -0800, Terence Parr <pa...@cs.usfca.edu> wrote:
> 
> On Jan 18, 2011, at 4:23 AM, Mark Wright wrote:
> 
> > Hi,
> > 
> > Since ANTLR v4 is in development, I was wondering if there were any thoughts
> > on making it possible to hook into the code generation earlier in order to
> > be able to add support for weird functional languages such as Haskell,
> > Lisp, Scheme, Racket, etc.  There were some earlier thoughts on this which
> > I copied below.
> 
> Hi Mark, sure! The new code generator should be much more flexible.
> 
> > I did have a look earlier at the feasibility of writing an ANTLR runtime
> > target for Haskell.  This seemed difficult for ANTLR v3, I could not
> > see how to easily hook into code generation earlier.  Since ANTLR v3
> > does not seem to have support for weird functional language targets, the
> > string templates ended up being designed for imperative languages.
> 
> Ola Bini was working on code generation for Emacs Lisp; I'm not sure how far 
> he got.
> 
> > 
> > As a workaround for ANTLR v3, I have written a Haskell binding to the
> > ANTLR v3.2 C runtime library:
> > 
> > http://hackage.haskell.org/package/antlrc
> > https://github.com/markwright/antlrc
> 
> I saw that. cool!
> 
> Perhaps we should have a conversation over Skype sometime about what your
> needs would be. of look that ML, Lisp, Haskell, Scala a lot but have not
> done anything significant in any of them.
> 
> Ter


Hi Ter,

Great, thanks!

I'm trying to learn Haskell, it is challenging to figure out some ideas
on what I might need.  I was wondering if the ANTLR v4 could have
have some hooks to be able to obtain the DFA data directly, in order
to have the option for functional language targets to generate the
DFA prediction code something like in the sigmaDfa1 function in the
code sketch below.  The sigmaDfa1 function looks so simple it seems like
cheating.  This code sketch is for the DFA in the diagram on
page 261 of The Definitive ANTLR Reference (P 2.0 printing)

The rest of the Haskell runtime is likely to get really crazy,
I think I will need to use continuations to jump everywhere in the
generated parser code.

Thanks, Mark

-- Copyright (c)2011, Mark Wright.  All rights reserved.
{-# LANGUAGE ScopedTypeVariables #-}

import Prelude (error, fromEnum, fst, otherwise, print, show, snd, toEnum, 
Char, Enum, Eq, Int, IO, Ord, Show, (||), (+), (-), (==), (>), (>=), (<), (<=), 
(/=), ($))
import qualified Data.List as L
import Data.Maybe
import Data.Array.IArray

-- | Given: 
-- | is the Array of tokens from the input file 
-- | p the current parsing position
-- | o the ANTLR offset from the current parsing position, which is different 
to 
-- |   a normal offset, as 0 is undefined and returns Nothing, 1 is the current
-- |   character, 2 is the next character, -1 is the previous character.
-- | return the character at ANTLR offset, or Nothing if the ANTLR offset is 
beyond
-- | the end of the file, or before the beginning of the file, or 0.
lt :: (Eq a, Ord a) => Array Int a -> Int -> Int -> Maybe a
lt is p o
  | o == 0 || p + o - 1 >= rangeSize (bounds is) || p + o < 0 = Nothing
  | o > 0 = Just (is ! (p + o - 1))
  | otherwise = Just (is ! (p + o))

-- | An example token sequence which should predict alt1 for the DFA on page 261
-- | of The Definitive ANTLR Reference.
la1 :: [MethodTokenId] = [INT, ID, LEFT_PARENTHESIS, INT, ID, COMMA, INT, ID,
 COMMA, INT, ID, RIGHT_PARENTHESIS, SEMICOLON]

a1 :: Array Int MethodTokenId
a1 = listArray (0, L.length la1 - 1) la1

la2 :: [MethodTokenId] = [INT, ID, LEFT_PARENTHESIS, INT, ID, COMMA, INT, ID,
 COMMA, INT, ID, RIGHT_PARENTHESIS, LEFT_CURLY_BRACE]

a2 :: Array Int MethodTokenId
a2 = listArray (0, L.length la2 - 1) la2

-- | The token IDs
data MethodTokenId = VOID
                   | INT
                   | LEFT_PARENTHESIS
                   | RIGHT_PARENTHESIS
                   | LEFT_CURLY_BRACE
                   | RIGHT_CURLY_BRACE
                   | COMMA
                   | SEMICOLON
                   | ID
                   | WS
                   deriving (Eq, Ord, Show)

instance Enum MethodTokenId where
  fromEnum VOID = 4
  fromEnum INT = 5
  fromEnum LEFT_PARENTHESIS = 6
  fromEnum RIGHT_PARENTHESIS = 7
  fromEnum LEFT_CURLY_BRACE = 8
  fromEnum RIGHT_CURLY_BRACE = 9
  fromEnum COMMA = 10
  fromEnum SEMICOLON = 11
  fromEnum ID = 12
  fromEnum WS = 13

  toEnum 4 = VOID
  toEnum 5 = INT
  toEnum 6 = LEFT_PARENTHESIS
  toEnum 7 = RIGHT_PARENTHESIS
  toEnum 8 = LEFT_CURLY_BRACE
  toEnum 9 = RIGHT_CURLY_BRACE
  toEnum 10 = COMMA
  toEnum 11 = SEMICOLON
  toEnum 12 = ID
  toEnum 13 = WS
  toEnum unmatched = error ("MethodTokenId.toEnum: Cannot match " L.++ show 
unmatched)

-- | The DFA states as labelled in the DFA diagram on p. 261 of the ANTLR book.
data DfaState1 = S0 
               | S1
               | S2
               | S3
               | S4
               | S5
               | S6
               | S7
               | S8
               | S9
               | S10
               | S11
               deriving (Eq, Show)

-- | Scanning indicates the DFA is still running.  NoMatch means the DFA does
-- | does not match this input.  Alt1 predicts a method forward declaration 
signature.
-- | Alt2 predicticts a concrete method definition.
data DfaAlt1 = Scanning
               | NoMatch
               | Alt1
               | Alt2
               deriving (Eq, Show)

-- | The DFA state transition function.
-- | First parameter is the current state.
-- | Second parameter is the token ID.
-- | Result is the (DfaAlt1, DfaState1) pair, where the
-- | DfaAlt1 is Scanning while the DFA is still scanning ahead,
-- | in which case DfaState1 is the next state.  Or if DfaAlt1 is
-- | NoMatch, then DfaState1 is the last state where the no match
-- | was detected.  Or DfaAlt1 is the predicted alternative, and
-- | DfaState1 is the final state.
sigmaDfa1 :: DfaState1 -> MethodTokenId -> (DfaAlt1, DfaState1)
sigmaDfa1 S0 t@_ | t `L.elem` [VOID, INT] = (Scanning, S1)
sigmaDfa1 S1 ID = (Scanning, S2)
sigmaDfa1 S2 LEFT_PARENTHESIS = (Scanning, S3)
sigmaDfa1 S3 INT = (Scanning, S4)
sigmaDfa1 S4 ID = (Scanning, S5)
sigmaDfa1 S5 COMMA = (Scanning, S6)
sigmaDfa1 S5 RIGHT_PARENTHESIS = (Scanning, S9)
sigmaDfa1 S6 INT = (Scanning, S7)
sigmaDfa1 S7 ID = (Scanning, S8)
sigmaDfa1 S8 COMMA = (Scanning, S6)
sigmaDfa1 S8 RIGHT_PARENTHESIS = (Scanning, S9)
sigmaDfa1 S9 LEFT_CURLY_BRACE = (Alt2, S11)
sigmaDfa1 S9 SEMICOLON = (Alt1, S10)
sigmaDfa1 s@_ _ = (NoMatch, s)

-- | Loop to run the DFA.
-- | alt indicates if we are still Scanning, or finished.
-- | s is the current state.
-- | is is the input stream of tokens.
-- | p is the current zero based offset from the start of the token stream.
-- | o is the lookahead one based token offset, as described in the lt function.
-- | DfaAlt1 is the predicted alternative, or NoMatch if no alternative is 
matched.
scanDfa1 :: DfaAlt1 -> DfaState1 -> Array Int MethodTokenId -> Int -> Int -> 
(DfaAlt1, DfaState1)
scanDfa1 alt s is p o =
  if alt == Scanning
    then let t = lt is p o
         in if t == Nothing 
              then (NoMatch, s)
              else let (alt', s') = (\(Just t') -> sigmaDfa1 s t') t
                   in scanDfa1 alt' s' is p (o + 1) 
    else (alt, s)
    
-- | Run the DFA to find the predicted alternative if the rule matches.
-- | is is the input stream of tokens.
-- | p is the current zero based offset from the start of the token stream.
-- | o is the lookahead one based token offset, as described in the lt function.
-- | DfaAlt1 is the predicted alternative, or NoMatch if no alternative is 
matched.
predictDfa1 :: Array Int MethodTokenId -> Int -> Int -> DfaAlt1
predictDfa1 is p o = fst $ scanDfa1 Scanning S0 is p o

main :: IO () 
main = do
  print $ predictDfa1 a1 0 1
  print $ predictDfa1 a2 0 1
_______________________________________________
antlr-dev mailing list
antlr-dev@antlr.org
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] Haskell, LISP, other weird stuff

Reply via email to