2008/9/18 Ian Lynagh <[EMAIL PROTECTED]>:
>
> Hi Chaddaï,
>
> On Mon, Sep 15, 2008 at 10:55:56AM +0200, Chaddaï Fouché wrote:
>>
>> Mon Sep 15 10:33:30 CEST 2008  "Chaddai Fouche" <[EMAIL PROTECTED]>
>>   * RichTokenStream support
>
> Can you re-send this as an attachment, please? I can't see how to
> massage it into a form that darcs accepts.
>

Sorry, here it is. It's the result of a darcs send -o ... .

-- 
Jedaï
Mon Sep 15 10:33:30 CEST 2008  "Chaddai Fouche" <[EMAIL PROTECTED]>
  * RichTokenStream support
  
  This patch adds support for raw token streams, that contains more 
informations than normal token streams (they contains comments at least). The 
"lexTokenStream" function brings this support to the Lexer module. In addition 
to that, functions have been added to the GHC module to make easier the 
recuperation of the token stream of a module ("getTokenStream").
  
  Building on that, I added what could be called "rich token stream" : token 
stream to which have been added the source string corresponding to each token, 
the function addSourceToToken takes a StringBuffer and a starting SrcLoc and a 
token stream and build this rich token stream. getRichTokenStream is a 
convenience function to get a module rich token stream. "showRichTokenStream" 
use the SrcLoc information in such a token stream to get a string similar to 
the original source (except unsignificant whitespaces). Thus "putStrLn . 
showRichTokenStream =<< getRichTokenStream s mod" should print a valid module 
source, the interesting part being to modify the token stream between the get 
and the show of course.   

New patches:

[RichTokenStream support
"Chaddai Fouche" <[EMAIL PROTECTED]>**20080915083330
 
 This patch adds support for raw token streams, that contains more informations 
than normal token streams (they contains comments at least). The 
"lexTokenStream" function brings this support to the Lexer module. In addition 
to that, functions have been added to the GHC module to make easier the 
recuperation of the token stream of a module ("getTokenStream").
 
 Building on that, I added what could be called "rich token stream" : token 
stream to which have been added the source string corresponding to each token, 
the function addSourceToToken takes a StringBuffer and a starting SrcLoc and a 
token stream and build this rich token stream. getRichTokenStream is a 
convenience function to get a module rich token stream. "showRichTokenStream" 
use the SrcLoc information in such a token stream to get a string similar to 
the original source (except unsignificant whitespaces). Thus "putStrLn . 
showRichTokenStream =<< getRichTokenStream s mod" should print a valid module 
source, the interesting part being to modify the token stream between the get 
and the show of course.   
] {
hunk ./compiler/main/DynFlags.hs 301
+   | Opt_KeepRawTokenStream
hunk ./compiler/main/GHC.hs 191
+        -- * Token stream manipulations
+        Token,
+        getTokenStream, getRichTokenStream,
+        showRichTokenStream, addSourceToTokens,
+
hunk ./compiler/main/GHC.hs 266
-import StringBuffer    ( StringBuffer, hGetStringBuffer )
+import StringBuffer    ( StringBuffer, hGetStringBuffer, nextChar )
hunk ./compiler/main/GHC.hs 273
+import Lexer
hunk ./compiler/main/GHC.hs 1576
---             The resulting graph has no hi-boot nodes, but can by cyclic
+--             The resulting graph has no hi-boot nodes, but can be cyclic
hunk ./compiler/main/GHC.hs 2285
--- This is for reconstructing refactored source code
--- Calls the lexer repeatedly.
--- ToDo: add comment tokens to token stream
-getTokenStream :: Session -> Module -> IO [Located Token]
hunk ./compiler/main/GHC.hs 2287
+-- Extract the filename, stringbuffer content and dynflags associed to a module
+getModuleSourceAndFlags :: Session -> Module -> IO (Maybe (String, 
StringBuffer, DynFlags))
+getModuleSourceAndFlags  s mod = do
+  mg <- getModuleGraph s
+  case [ m | m <- mg, ms_mod m == mod ] of
+    [] -> return Nothing
+    m:_ -> do
+      case ml_hs_file $ ms_location m of
+        Nothing -> return Nothing
+        Just sourceFile -> do
+               source <- hGetStringBuffer sourceFile
+               return . Just $ (sourceFile, source, ms_hspp_opts m)
+
+
+-- | Give the list of token from the module source, including comments
+-- Calls the lexer repeatedly.
+getTokenStream :: Session -> Module -> IO (Maybe [Located Token])
+getTokenStream s mod = do
+  src <- getModuleSourceAndFlags s mod
+  case src of
+    Nothing -> return Nothing
+    Just (sourceFile, source, flags) -> 
+        do
+          let startLoc = mkSrcLoc (mkFastString sourceFile) 0 0
+          case lexTokenStream source startLoc flags of
+            POk _ ts -> return $ Just ts
+            _ -> return Nothing
+
+-- | Give even more information on the source than getTokenStream
+-- This function allows to reconstruct the source completely with
+-- 'showRichTokenStream'
+getRichTokenStream :: Session -> Module -> IO (Maybe [(Located Token, String)])
+getRichTokenStream s mod = do
+  src <- getModuleSourceAndFlags s mod
+  case src of
+    Nothing -> return Nothing
+    Just (sourceFile, source, flags) -> 
+        do
+          let startLoc = mkSrcLoc (mkFastString sourceFile) 0 0
+          case lexTokenStream source startLoc flags of
+                 POk _ ts -> return . Just $ addSourceToTokens startLoc source 
ts
+                 _ -> return Nothing
+
+-- | Given a source location and a StringBuffer corresponding to this
+-- location, return a rich token stream with the source associated to the 
tokens
+addSourceToTokens :: SrcLoc -> StringBuffer -> [Located Token] -> [(Located 
Token, String)]
+addSourceToTokens _ _ [] = []
+addSourceToTokens loc buf (t@(L span _) : ts) 
+    | not (isGoodSrcSpan span) = (t,"") : addSourceToTokens loc buf ts
+    | otherwise = (t,str) : addSourceToTokens newLoc newBuf ts
+    where 
+      (newLoc, newBuf, str) = go "" loc buf
+      start = srcSpanStart span
+      end = srcSpanEnd span
+      go acc loc buf | loc < start = go acc nLoc nBuf
+                     | start <= loc && loc < end = go (ch:acc) nLoc nBuf
+                     | otherwise = (loc, buf, reverse acc)
+          where (ch, nBuf) = nextChar buf
+                nLoc = advanceSrcLoc loc ch
+
+
+-- | take a rich token stream such as produced from 'getRichTokenStream' and
+-- evaluates to a source almost identical to the original code (except
+-- unsignificant whitespace)
+showRichTokenStream :: [(Located Token, String)] -> String
+showRichTokenStream ts = go startLoc ts ""
+    where sourceFile = srcSpanFile (getLoc . fst . head $ ts)
+          startLoc = mkSrcLoc sourceFile 0 0
+          go _ [] = id
+          go loc ((L span _, str):ts) 
+              | not (isGoodSrcSpan span) = go loc ts
+              | locLine == tokLine = ((replicate (tokCol - locCol) ' ') ++)
+                                     . (str ++)
+                                     . go tokEnd ts
+              | otherwise = ((replicate (tokLine - locLine) '\n') ++)
+                            . ((replicate tokCol ' ') ++)
+                            . (str ++)
+                            . go tokEnd ts
+              where (locLine, locCol) = (srcLocLine loc, srcLocCol loc)
+                    (tokLine, tokCol) = (srcSpanStartLine span, 
srcSpanStartCol span)
+                    tokEnd = srcSpanEnd span
+
hunk ./compiler/parser/Lexer.x 44
-   addWarning
+   addWarning,
+   lexTokenStream
hunk ./compiler/parser/Lexer.x 152
-"-- " ~[$docsym \#] .* ;
-"--" [^$symbol : \ ] .* ;
+"-- " ~[$docsym \#] .* { lineCommentToken }
+"--" [^$symbol : \ ] .* { lineCommentToken }
hunk ./compiler/parser/Lexer.x 157
-"-- " [$docsym \#] .* / { ifExtension (not . haddockEnabled) } ;
+"-- " [$docsym \#] .* / { ifExtension (not . haddockEnabled) } { 
lineCommentToken }
hunk ./compiler/parser/Lexer.x 165
-"---"\-* [^$symbol :] .* ;
+"---"\-* [^$symbol :] .* { lineCommentToken }
hunk ./compiler/parser/Lexer.x 170
-"--"\-* / { atEOL } ;
+"--"\-* / { atEOL } { lineCommentToken }
hunk ./compiler/parser/Lexer.x 175
-"-- " / { atEOL } ;
+"-- " / { atEOL } { lineCommentToken }
hunk ./compiler/parser/Lexer.x 281
-  "-- #" .* ;
+  "-- #" .* { lineCommentToken }
hunk ./compiler/parser/Lexer.x 579
+  | ITlineComment     String     -- comment starting by "--"
+  | ITblockComment    String     -- comment in {- -}
hunk ./compiler/parser/Lexer.x 808
+lineCommentToken :: Action
+lineCommentToken span buf len = do
+  b <- extension rawTokenStreamEnabled
+  if b then strtoken ITlineComment span buf len else lexToken
+
hunk ./compiler/parser/Lexer.x 820
-  go (1::Int) input
+  go "" (1::Int) input
hunk ./compiler/parser/Lexer.x 822
-    go 0 input = do setInput input; cont
-    go n input = case alexGetChar input of
+    go commentAcc 0 input = do setInput input
+                               b <- extension rawTokenStreamEnabled
+                               if b 
+                                 then docCommentEnd input commentAcc 
ITblockComment _str span
+                                 else cont
+    go commentAcc n input = case alexGetChar input of
hunk ./compiler/parser/Lexer.x 831
-        Just ('\125',input) -> go (n-1) input
-        Just (_,_)          -> go n input
+        Just ('\125',input) -> go commentAcc (n-1) input
+        Just (_,_)          -> go ('-':commentAcc) n input
hunk ./compiler/parser/Lexer.x 835
-        Just ('-',input) -> go (n+1) input
-        Just (_,_)       -> go n input
-      Just (_,input) -> go n input
+        Just ('-',input) -> go ('-':'\123':commentAcc) (n+1) input
+        Just (_,_)       -> go ('\123':commentAcc) n input
+      Just (c,input) -> go (c:commentAcc) n input
hunk ./compiler/parser/Lexer.x 1611
+rawTokenStreamBit = 20 -- producing a token stream with all comments included
hunk ./compiler/parser/Lexer.x 1634
+rawTokenStreamEnabled flags = testBit flags rawTokenStreamBit
hunk ./compiler/parser/Lexer.x 1696
-           .|. transformComprehensionsBit `setBitIf` dopt 
Opt_TransformListComp flags
+               .|. transformComprehensionsBit `setBitIf` dopt 
Opt_TransformListComp flags
+               .|. rawTokenStreamBit `setBitIf` dopt Opt_KeepRawTokenStream 
flags
hunk ./compiler/parser/Lexer.x 1813
+
+lexTokenStream :: StringBuffer -> SrcLoc -> DynFlags -> ParseResult [Located 
Token]
+lexTokenStream buf loc dflags = unP go initState
+    where initState = mkPState buf loc (dopt_set (dopt_unset dflags 
Opt_Haddock) Opt_KeepRawTokenStream)
+          go = do     
+            ltok <- lexer return
+            case ltok of
+              L _ ITeof -> return []
+              _ -> liftM (ltok:) go
}

Context:

[We need to tell ghc-pkg to --force if we've only built a profiling library
Ian Lynagh <[EMAIL PROTECTED]>**20080913153142] 
[If we're profiling GHC, don't bother building the GHC package the vanilla way
Ian Lynagh <[EMAIL PROTECTED]>**20080913144820] 
[Remove the duplicate show rule in libraries/Makefile
Ian Lynagh <[EMAIL PROTECTED]>**20080913144413] 
[Move the "show" target from target.mk to boilerplate.mk
Ian Lynagh <[EMAIL PROTECTED]>**20080913141312
 target.mk isn't included everywhere, but show is always handy
] 
[Change how we detect if we are using the bootstrapping compiler or not
Ian Lynagh <[EMAIL PROTECTED]>**20080913104658
 I think looking for $(GHC_COMPILER_DIR_ABS) was failing on the Windows
 buildbot due to different path separators. Now we just look for
 "inplace".
] 
[wibble the distrib Makefile
Ian Lynagh <[EMAIL PROTECTED]>**20080912135932
 We now need to install driver after ghc
] 
[Reinstate the driver/ghc directory, to create a versioned GHC program
Ian Lynagh <[EMAIL PROTECTED]>**20080912113619
 e.g. $(bindir)/ghc-6.9.20080911
] 
[If USE_NEW_MKDEPEND_FLAGS is YES then don't use the deprecated -optdep flags
Ian Lynagh <[EMAIL PROTECTED]>**20080912110316] 
[Use --force-local when calling tar in bindisttest/
Ian Lynagh <[EMAIL PROTECTED]>**20080912012855
 Otherwise it thinks that c:/foo is a remote file
] 
[Fix #2586, bug in THUNK_SELECTORs (again)
Simon Marlow <[EMAIL PROTECTED]>**20080912130404
 This time, we had forgetten the write barrier in one place.
] 
[TAG 2008-09-12 2
Ian Lynagh <[EMAIL PROTECTED]>**20080912132848] 
Patch bundle hash:
1ccac2a6e45d312b1cef2d71c3831a66dad25592
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to