Re: TTG: Handling Source Locations

Richard Eisenberg Tue, 12 Feb 2019 06:20:00 -0800


> On Feb 12, 2019, at 5:19 AM, Shayan Najd <sh.n...@gmail.com> wrote:
> 
> About the new code, the convention is straightforward: anytime you
> destruct an AST node, assume a wrapper node inside (add an extra
> case), or use the smart constructors/pattern synonyms.

Aha! This, I did not know. So, you're saying that all the consumers of the GHC 
AST need to remember to use dL every time they pattern-match. With the new 
design, using dL when it's unnecessary doesn't hurt, but forgetting it is 
problematic. So: just use it every time. My problem, though, is that this is 
just a convention -- no one checks it. It would be easy to forget.

> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs 
> <ghc-devs@haskell.org> wrote:
> 
> One way to think of it is this: we can now put SrcSpans where they make 
> sense, rather than everywhere.

This has some logic to it, but I'm not quite sold. Another way of saying this 
is that the new design favors flexibility for the producer, at the cost of 
requiring consumers to be aware of and consistently apply the convention Shayan 
describes above. The problem is, though, that if the producer is stingy in 
adding source locations, the consumer won't know which locations are expected 
to be informative. Is the consumer expected to collect locations from a variety 
of places and try to combine them somehow? I doubt it. So this means that the 
flexibility for the producer isn't really there -- the type system will accept 
arbitrary choices of where to put locations, but consumers won't get the 
locations where they expect them.

>   We can still say (Located t) in places where we want to guarantee a SrcSpan.

This seems to go against the TTG philosophy. We can do this in, say, the return 
type of a function, but we can't in the AST proper, because that's shared among 
a number of clients, some of whom don't want the source locations.

> 
> Yes, this lets us add more than one; that's redundant but not harmful.

I disagree here. If we add locations to a node twice, then we'll have to use dL 
twice to find the underlying constructor. This is another case there the type 
system offers the producer flexibility but hamstrings the consumer.

> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <vladis...@serokell.io> wrote:
> 
> I claim an SrcSpan makes sense everywhere, so this is not a useful
> distinction. Think about it as code provenance, an AST node always
> comes from somewhere

I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance 
is better. We could even imagine using the new HasCallStack feature to track 
where generated code comes from (perhaps only in DEBUG compilers). Do we need 
to do this today? I'm not sure there's a crying need. But philosophically, we 
are able to attach a provenance to every slice of AST, so there's really no 
reason for uninformative locations.

> My concrete proposal: let's just put SrcSpan in the extension fields
> of each node

I support this counter-proposal. Perhaps if it required writing loads of extra 
type instances, I wouldn't be as much in favor. But we already have to write 
those instances -- they just change from NoExt to SrcSpan. This seems to solve 
all the problems nicely, at relatively low cost. And, I'm sure it's more 
efficient at runtime than either the previous ping-pong style or the current 
scenario, as we can pattern-match on constructors directly, requiring one less 
pointer-chase or function call.

One downside of this proposal is that it means that more care will have to be 
taken when setting the extension field of AST nodes after a pass, making sure 
to preserve the location. (This isn't really all that different from 
location-shuffling today.) A quick mock-up shows that record-updates can make 
this easier:

> data Phase = Parsed | Renamed
> 
> data Exp p = Node (XNode p) Int
> 
> type family XNode (p :: Phase)
> type instance XNode p = NodeExt p
> 
> data NodeExt p where
>   NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
> 
> type family RenamedOnly p t where
>   RenamedOnly Parsed _ = ()
>   RenamedOnly Renamed t = t
> 
> example :: Exp Parsed
> example = Node (NodeExt { flag = True, fvs = () }) 5
> 
> rename :: Exp Parsed -> Exp Renamed
> rename (Node ext n) = Node (ext { fvs = "xyz" }) n

Note that the extension point is a record type that has a field available only 
after renaming. We can then do a type-changing record update when producing the 
renamed node, preserving the flag in the code above. What's sad is that, if 
there were no renamer-only field, we couldn't do a type-changing empty record 
update as the default case. (Haskell doesn't have empty record updates. Perhaps 
it should. They would be useful in doing a type-change on a datatype with a 
phantom index. A clever compiler could even somehow ensure that such a record 
update is completely compiled away.) In any case, this little example is 
essentially orthogonal to my points above, and the choice of whether to use 
records or other structures are completely local to the extension point. I just 
thought it might make for a nice style.

Thanks,
Richard
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: TTG: Handling Source Locations

Reply via email to