On Sat, 22 Jan 2011 11:13:36 -0800, Terence Parr <pa...@cs.usfca.edu> wrote: > > On Jan 18, 2011, at 4:23 AM, Mark Wright wrote: > > > Hi, > > > > Since ANTLR v4 is in development, I was wondering if there were any thoughts > > on making it possible to hook into the code generation earlier in order to > > be able to add support for weird functional languages such as Haskell, > > Lisp, Scheme, Racket, etc. There were some earlier thoughts on this which > > I copied below. > > Hi Mark, sure! The new code generator should be much more flexible. > > > I did have a look earlier at the feasibility of writing an ANTLR runtime > > target for Haskell. This seemed difficult for ANTLR v3, I could not > > see how to easily hook into code generation earlier. Since ANTLR v3 > > does not seem to have support for weird functional language targets, the > > string templates ended up being designed for imperative languages. > > Ola Bini was working on code generation for Emacs Lisp; I'm not sure how far > he got. > > > > > As a workaround for ANTLR v3, I have written a Haskell binding to the > > ANTLR v3.2 C runtime library: > > > > http://hackage.haskell.org/package/antlrc > > https://github.com/markwright/antlrc > > I saw that. cool! > > Perhaps we should have a conversation over Skype sometime about what your > needs would be. of look that ML, Lisp, Haskell, Scala a lot but have not > done anything significant in any of them. > > Ter
Hi Ter, Great, thanks! I'm trying to learn Haskell, it is challenging to figure out some ideas on what I might need. I was wondering if the ANTLR v4 could have have some hooks to be able to obtain the DFA data directly, in order to have the option for functional language targets to generate the DFA prediction code something like in the sigmaDfa1 function in the code sketch below. The sigmaDfa1 function looks so simple it seems like cheating. This code sketch is for the DFA in the diagram on page 261 of The Definitive ANTLR Reference (P 2.0 printing) The rest of the Haskell runtime is likely to get really crazy, I think I will need to use continuations to jump everywhere in the generated parser code. Thanks, Mark -- Copyright (c)2011, Mark Wright. All rights reserved. {-# LANGUAGE ScopedTypeVariables #-} import Prelude (error, fromEnum, fst, otherwise, print, show, snd, toEnum, Char, Enum, Eq, Int, IO, Ord, Show, (||), (+), (-), (==), (>), (>=), (<), (<=), (/=), ($)) import qualified Data.List as L import Data.Maybe import Data.Array.IArray -- | Given: -- | is the Array of tokens from the input file -- | p the current parsing position -- | o the ANTLR offset from the current parsing position, which is different to -- | a normal offset, as 0 is undefined and returns Nothing, 1 is the current -- | character, 2 is the next character, -1 is the previous character. -- | return the character at ANTLR offset, or Nothing if the ANTLR offset is beyond -- | the end of the file, or before the beginning of the file, or 0. lt :: (Eq a, Ord a) => Array Int a -> Int -> Int -> Maybe a lt is p o | o == 0 || p + o - 1 >= rangeSize (bounds is) || p + o < 0 = Nothing | o > 0 = Just (is ! (p + o - 1)) | otherwise = Just (is ! (p + o)) -- | An example token sequence which should predict alt1 for the DFA on page 261 -- | of The Definitive ANTLR Reference. la1 :: [MethodTokenId] = [INT, ID, LEFT_PARENTHESIS, INT, ID, COMMA, INT, ID, COMMA, INT, ID, RIGHT_PARENTHESIS, SEMICOLON] a1 :: Array Int MethodTokenId a1 = listArray (0, L.length la1 - 1) la1 la2 :: [MethodTokenId] = [INT, ID, LEFT_PARENTHESIS, INT, ID, COMMA, INT, ID, COMMA, INT, ID, RIGHT_PARENTHESIS, LEFT_CURLY_BRACE] a2 :: Array Int MethodTokenId a2 = listArray (0, L.length la2 - 1) la2 -- | The token IDs data MethodTokenId = VOID | INT | LEFT_PARENTHESIS | RIGHT_PARENTHESIS | LEFT_CURLY_BRACE | RIGHT_CURLY_BRACE | COMMA | SEMICOLON | ID | WS deriving (Eq, Ord, Show) instance Enum MethodTokenId where fromEnum VOID = 4 fromEnum INT = 5 fromEnum LEFT_PARENTHESIS = 6 fromEnum RIGHT_PARENTHESIS = 7 fromEnum LEFT_CURLY_BRACE = 8 fromEnum RIGHT_CURLY_BRACE = 9 fromEnum COMMA = 10 fromEnum SEMICOLON = 11 fromEnum ID = 12 fromEnum WS = 13 toEnum 4 = VOID toEnum 5 = INT toEnum 6 = LEFT_PARENTHESIS toEnum 7 = RIGHT_PARENTHESIS toEnum 8 = LEFT_CURLY_BRACE toEnum 9 = RIGHT_CURLY_BRACE toEnum 10 = COMMA toEnum 11 = SEMICOLON toEnum 12 = ID toEnum 13 = WS toEnum unmatched = error ("MethodTokenId.toEnum: Cannot match " L.++ show unmatched) -- | The DFA states as labelled in the DFA diagram on p. 261 of the ANTLR book. data DfaState1 = S0 | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 deriving (Eq, Show) -- | Scanning indicates the DFA is still running. NoMatch means the DFA does -- | does not match this input. Alt1 predicts a method forward declaration signature. -- | Alt2 predicticts a concrete method definition. data DfaAlt1 = Scanning | NoMatch | Alt1 | Alt2 deriving (Eq, Show) -- | The DFA state transition function. -- | First parameter is the current state. -- | Second parameter is the token ID. -- | Result is the (DfaAlt1, DfaState1) pair, where the -- | DfaAlt1 is Scanning while the DFA is still scanning ahead, -- | in which case DfaState1 is the next state. Or if DfaAlt1 is -- | NoMatch, then DfaState1 is the last state where the no match -- | was detected. Or DfaAlt1 is the predicted alternative, and -- | DfaState1 is the final state. sigmaDfa1 :: DfaState1 -> MethodTokenId -> (DfaAlt1, DfaState1) sigmaDfa1 S0 t@_ | t `L.elem` [VOID, INT] = (Scanning, S1) sigmaDfa1 S1 ID = (Scanning, S2) sigmaDfa1 S2 LEFT_PARENTHESIS = (Scanning, S3) sigmaDfa1 S3 INT = (Scanning, S4) sigmaDfa1 S4 ID = (Scanning, S5) sigmaDfa1 S5 COMMA = (Scanning, S6) sigmaDfa1 S5 RIGHT_PARENTHESIS = (Scanning, S9) sigmaDfa1 S6 INT = (Scanning, S7) sigmaDfa1 S7 ID = (Scanning, S8) sigmaDfa1 S8 COMMA = (Scanning, S6) sigmaDfa1 S8 RIGHT_PARENTHESIS = (Scanning, S9) sigmaDfa1 S9 LEFT_CURLY_BRACE = (Alt2, S11) sigmaDfa1 S9 SEMICOLON = (Alt1, S10) sigmaDfa1 s@_ _ = (NoMatch, s) -- | Loop to run the DFA. -- | alt indicates if we are still Scanning, or finished. -- | s is the current state. -- | is is the input stream of tokens. -- | p is the current zero based offset from the start of the token stream. -- | o is the lookahead one based token offset, as described in the lt function. -- | DfaAlt1 is the predicted alternative, or NoMatch if no alternative is matched. scanDfa1 :: DfaAlt1 -> DfaState1 -> Array Int MethodTokenId -> Int -> Int -> (DfaAlt1, DfaState1) scanDfa1 alt s is p o = if alt == Scanning then let t = lt is p o in if t == Nothing then (NoMatch, s) else let (alt', s') = (\(Just t') -> sigmaDfa1 s t') t in scanDfa1 alt' s' is p (o + 1) else (alt, s) -- | Run the DFA to find the predicted alternative if the rule matches. -- | is is the input stream of tokens. -- | p is the current zero based offset from the start of the token stream. -- | o is the lookahead one based token offset, as described in the lt function. -- | DfaAlt1 is the predicted alternative, or NoMatch if no alternative is matched. predictDfa1 :: Array Int MethodTokenId -> Int -> Int -> DfaAlt1 predictDfa1 is p o = fst $ scanDfa1 Scanning S0 is p o main :: IO () main = do print $ predictDfa1 a1 0 1 print $ predictDfa1 a2 0 1 _______________________________________________ antlr-dev mailing list antlr-dev@antlr.org http://www.antlr.org/mailman/listinfo/antlr-dev