Mainly, we need information on every Token that appears in the original source.

Good idea.  Alan Zimmerman’s exact-print stuff has precisely that goal, I 
believe.   So it’d be worth talking to him; perhaps by working together you can 
make much more rapid progress.  Or not – but a conversation would be helpful in 
any case.  I’m very happy to see more attention and effort being devoted to 
this space.  Thank you!

Simon

From: Zubin Duggal <[email protected]>
Sent: 15 May 2018 10:13
To: Simon Peyton Jones <[email protected]>
Cc: Gershom B <[email protected]>; [email protected]; Joachim Breitner 
<[email protected]>; Shayan Najd <[email protected]>; Alan & Kim 
Zimmerman <[email protected]>
Subject: Re: HIE Files

And that in turn raises the questions of WHAT syntax tree.  HsSyn?  Template 
Haskell?  Haskell-src-exts?  Or something new?   Shayan and Alan are busy 
parameterising HsSyn to make it non-GHC-specific, and directly usable for this 
kind of endeavour ("Trees that grow").  It'd be great to build on their work.

Mainly, we need information on every Token that appears in the original source. 
My plan is to further group Tokens into a simple rose-tree based on how they 
occur in HsSyn. We intentionally want to avoid capturing too much information 
so the format doesn't change much with changes to the GHC AST.
I've made a file describing roughly what the data structures involved should 
look like

https://gist.github.com/wz1000/edf14747bd890b08c01c226d5bc6a1d6<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fwz1000%2Fedf14747bd890b08c01c226d5bc6a1d6&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949664358&sdata=PR4Y9%2FYfXl5ubStTbKXRkmtosP%2Fn9GiXRhZrfokwfZY%3D&reserved=0>
The plan is to group the Tokens together in a tree in way similar to what 
structured-haskell-mode does. (The gifs in the following link might provide 
some idea)
https://github.com/chrisdone/structured-haskell-mode/<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchrisdone%2Fstructured-haskell-mode%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949674367&sdata=sQzFKVRTcL82CNIxFi2A7fbdP4zyzReXPM1kkUoiaN0%3D&reserved=0>
For example, here is what structured-haskell-mode outputs for a small snippet 
of code: 
https://gist.github.com/wz1000/db42d4f533ba7d2345934906b312f743<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fwz1000%2Fdb42d4f533ba7d2345934906b312f743&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949684376&sdata=e9OoLIRFfa2JCmLAf99LfT1Sqc8UFMDN2lIIu10QVBg%3D&reserved=0>
We want something similar for the HIE AST, but grouped into a tree, where each 
node(roughly corresponding to HsSyn constructors) points to all the subnodes 
and tokens it spans over.
That's great.  But would it not be good to offer a library, with a well-defined 
API, that allows a client (including Haddock) to parse those .hie files into 
syntax trees or whatever?  You'll need to do that to allow the haddock thing 
you describe -- and it'd be much better to make the parser (and doubtless lots 
of utility function like finding things in the tree) available to any client 
not just haddock.

Yes, a library to consume these files is definitely something we need, and I 
believe it will grow out naturally as we work out the integration with haddock 
and haskell-ide-engine.

On 15 May 2018 at 14:12, Simon Peyton Jones 
<[email protected]<mailto:[email protected]>> wrote:
|  > Why not put the .hie-file info into the .hi file?  (Optionally, of
|  > course.)
|  >
|
|  Simon, I'm curious what benefits you think we might get from this?
|  (I'm one of the mentors on this GSoC project btw).

Well, I've always thought that we should really put the .hi file into the .o 
file!  Having two files risks getting things out of sync, and three makes that 
worse.  The file is just a place to keep a blob of info.  What's the motivation 
for having two .hie as well as .hi?

|
|  > What tools/libraries do you plan to produce to allow clients to read
|  a .hie file and make send of the contents?
|
|  For GSoC as a proof of concept the idea is to teach haddock's
|  hyperlinked-source backend to use this information to add type-
|  annotation-on-hover to the colorized, hyperlinked, html source.

That's great.  But would it not be good to offer a library, with a well-defined 
API, that allows a client (including Haddock) to parse those .hie files into 
syntax trees or whatever?  You'll need to do that to allow the haddock thing 
you describe -- and it'd be much better to make the parser (and doubtless lots 
of utility function like finding things in the tree) available to any client 
not just haddock.

And that in turn raises the questions of WHAT syntax tree.  HsSyn?  Template 
Haskell?  Haskell-src-exts?  Or something new?   Shayan and Alan are busy 
parameterising HsSyn to make it non-GHC-specific, and directly usable for this 
kind of endeavour ("Trees that grow").  It'd be great to build on their work.

|  with the GHC API. (This by the way is one of the key benefits of
|  keeping the file separate from standard hi files -- it should be
|  parseable and consumable without needing to link in GHC).

Yes, not linking in GHC is a reasonable goal; but having two files and file 
formats is not a necessary consequence of that goal.  Nothing stops us making a 
library to parse .hi files -- indeed the entire iface/ directory in GHC is 
quite well separated for that precise purpose.

None of this is to criticise the plan.  I think it's a great idea to make more 
info more readily available to more tools.   I'm just poking at it a bit 😊.

Simon

_______________________________________________
ghc-devs mailing list
[email protected]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to