#5255: String literals cause runtime crashes when OverloadedStrings is in effect
---------------------------------+------------------------------------------
    Reporter:  YitzGale          |        Owner:              
        Type:  bug               |       Status:  new         
    Priority:  normal            |    Milestone:              
   Component:  Compiler          |      Version:  7.0.3       
    Keywords:                    |     Testcase:              
   Blockedby:                    |   Difficulty:              
          Os:  Unknown/Multiple  |     Blocking:              
Architecture:  Unknown/Multiple  |      Failure:  None/Unknown
---------------------------------+------------------------------------------

Old description:

> There has been a discussion[1] on the web-devel list about
> the fate of the `IsString` instance for Name in the xml-types
> library[2]. A Name is the name of an XML element or attribute.
>
> That instance calls error when the string contains a certain
> kind of invalid domain-specific syntax. Some are even advocating
> expanding this behavior to any string that is syntactically
> invalid for XML names.
>
> So we now have GHC as the only major compiler which
> can cause *runtime* crashes depending on what characters
> are used in a string literal.
>
> `OverloadedStrings` as a more general mechanism is very
> convenient in many settings. One of them is XML names;
> another is attoparsec-text[3] parsers. I must
> admit I have succumbed to the temptation of making this
> deal with the devil and benefiting from them.
>
> But when used this way `OverloadedStrings`
> is really just another syntax for quasi-quotation, and that
> is what should have been used explicitly instead of these
> unsafe domain-specific IsString instances.
>
> I propose fixing the problem in one of the following ways:
>
> A. Make string literals syntax in fact a specialized
> quasi-quotation when `OverloadedStrings` is turned on. That way,
> exceptions are caught at compile time as they should be.
>
> B. Bless Text, and possibly `ByteString`, as the only types that
> get magical behavior of string literals.
>
> C. Remove `OverloadedStrings` altogether.
>
> Option A is by far the nicest. But it requires GHC
> to know the type of the string literal before
> the cast is applied. We might also need some way to help
> GHC find the cast function at the right time, beyond just
> having an IsString instance somewhere in scope.
>
> By submitting this bug, I am making it clear that I am opposed
> to Option D, leaving things the way they are and wishing
> everyone the best of luck. The `OverloadedStrings` pragma
> is not really optional anymore now that Text is becoming
> the default string type in practice for Haskell. It is not
> acceptable to have to wrap every string awkwardly with
> `(T.pack "")` and give up the chance of it being CAFfed.
> In fact, the blaze-html[4] library relies on
> `OverloadedStrings` for its performance[5].
>
> I am also opposed, though less so, to providing a
> deprecation route by using a new language pragma for
> Option A or B. The current behavior is dangerous and
> should be summarily removed.
>
>  * [1] http://www.haskell.org/pipermail/web-devel/2011/001630.html
>  * [2] http://hackage.haskell.org/package/xml-types
>  * [3] http://hackage.haskell.org/package/attoparsec-text
>  * [4] http://hackage.haskell.org/package/blaze-html
>  * [5] http://www.haskell.org/pipermail/web-devel/2011/001717.html

New description:

 There has been a discussion[1] on the web-devel list about
 the fate of the `IsString` instance for Name in the xml-types
 library[2]. A Name is the name of an XML element or attribute.

 That instance calls error when the string contains a certain
 kind of invalid domain-specific syntax. Some are even advocating
 expanding this behavior to any string that is syntactically
 invalid for XML names.

 So we now have GHC as the only major compiler which
 can cause *runtime* crashes depending on what characters
 are used in a string literal.

 `OverloadedStrings` as a more general mechanism is very
 convenient in many settings. One of them is XML names;
 another is attoparsec-text[3] parsers. I must
 admit I have succumbed to the temptation of making this
 deal with the devil and benefiting from them.

 But when used this way `OverloadedStrings`
 is really just another syntax for quasi-quotation, and that
 is what should have been used explicitly instead of these
 unsafe domain-specific `IsString` instances.

 I propose fixing the problem in one of the following ways:

 A. Make string literals syntax in fact a specialized
 quasi-quotation when `OverloadedStrings` is turned on. That way,
 exceptions are caught at compile time as they should be.

 B. Bless Text, and possibly `ByteString`, as the only types that
 get magical behavior of string literals.

 C. Remove `OverloadedStrings` altogether.

 Option A is by far the nicest. But it requires GHC
 to know the type of the string literal before
 the cast is applied. We might also need some way to help
 GHC find the cast function at the right time, beyond just
 having an `IsString` instance somewhere in scope.

 By submitting this bug, I am making it clear that I am opposed
 to Option D, leaving things the way they are and wishing
 everyone the best of luck. The `OverloadedStrings` pragma
 is not really optional anymore now that Text is becoming
 the default string type in practice for Haskell. It is not
 acceptable to have to wrap every string awkwardly with
 `(T.pack "")` and give up the chance of it being CAFfed.
 In fact, the blaze-html[4] library relies on
 `OverloadedStrings` for its performance[5].

 I am also opposed, though less so, to providing a
 deprecation route by using a new language pragma for
 Option A or B. The current behavior is dangerous and
 should be summarily removed.

  * [1] http://www.haskell.org/pipermail/web-devel/2011/001630.html
  * [2] http://hackage.haskell.org/package/xml-types
  * [3] http://hackage.haskell.org/package/attoparsec-text
  * [4] http://hackage.haskell.org/package/blaze-html
  * [5] http://www.haskell.org/pipermail/web-devel/2011/001717.html

--

Comment(by simonpj):

 I'm having trouble following the details of this discussion; some examples
 would help.

 As I understand it, what you want is to be able to write the Haskell
 expression
 {{{
  "<foo>mumble</blah>" :: XML
 }}}
 and have a ''compile-time'' error saying that the string is ill-formed.
 Is that right.

 If so, the solution is to hand, in the form of quasi-quotation
 ([http://www.haskell.org/ghc/docs/7.0-latest/html/users_guide/template-
 haskell.html#th-quasiquotation]), Geoff Mainland's enhancement to Template
 Haskell.  You say
 {{{
   [xml| <foo>mumble</blah> |]
 }}}
 GHC runs the `xml` quasi-quoter (which you write, or provide in your
 library) and it can check the string.  Moreover, the quasi-quoter monad
 hooks into GHC's error reporting machinery, so you can report errors just
 as if they came from GHC.  Moreover, you can put these quasi-quotes in
 patterns, or types, or declarations.

 This seems so easy and so natural that I'm not sure what else you might
 want. Maybe I am missing the point. I suppose you might not like the
 concrete syntax; improvements welcome.  But somewhere you have to say what
 the parser is.

 Simon

 PS: Is Template Haskell ''really'' a "horrific mess"?

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/5255#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to