Hi everyone,
tl;dr : too often HTML is created with ad-hoc string manipulation. This is about an attempt to promote better practices, particularly in the Ruby world. I've been speaking at a couple of Ruby conferences in the last months about using a more structured approach to HTML generation. I wanted to bring this to your attention because the idea has been greatly influenced by Langsec. Basically it's taking some of the principles for secure parsers and applying them to language generators. The main selling point is increased security, especially preventing XSS. The current practice is to use a combination of template languages and "helpers", functions that return strings representing HTML fragments. Because of this the semantics of a string are unclear, it can be mere textual data, or a serialized fragment of a HTML document. The programmer needs to constantly indicate this difference by manually adding calls to escape the HTML, and this is error prone. What's worse, it's possible to generate all kinds of badly structured, invalid documents, and you might not know until your app is running. Or never at all, because browsers are so forgiving about these things. But all of this adds up to make your app more vulnerable to injection attacks in your HTML. The suggested alternative is to first create a syntax tree of the document you want to transmit, and only generate serialized HTML right before it gets sent over the wire. What for parsers is called "Full Recognition Before Processing" we can rephrase to "Full Generation After Processing". That is, decouple your program logic from the language generator. Because now a dedicated component handles the language generation, it's output can be made strictly context-free HTML at all times, thus sticking to the good part of Postel's principle, "Be conservative in what you send", and making it easier to reason about how browsers will handle this input. I'm focusing on Ruby because it's the language I use and love, and because I want the Ruby web community to do better. But as I said, the syntax tree approach is far removed from established practices. And while I'm convinced that using a data structure/syntax tree approach is inherently more expressive and more productive as a developer, as well as more secure, the fact of the matter is that you lose the huge ecosystem of string-based tools and libraries for Ruby that are out there. So I'm trying to do some foundational work to bridge this gap. I started a project called Hexp [1], which is an API for easily and efficiently creating and manipulating HTML syntax trees. It's already quite usable, I'm using it on several smaller projects already. Hopefully this can form the ground work for a new collection of tools. I'm still waiting for the videos of my talks to come on-line. The slides of my latest talk at Eurucamp, Berlin are available at [2]. My previous talk at Rulu, Lyon are less concise but cover more theory [3]. (Hit space to cycle throught the slides.) I know Ruby isn't the only language where this pattern exists. Hopefully we can get some of these ideas across to more application developers out there. Thanks for reading this far :), Arne [1] http://github.com/plexus [2] http://arnebrasseur.net/talks/eurucamp2013/presentation.html [3] http://arnebrasseur.net/talks/rulu2013/index.html ---- Arne Brasseur Twitter/Github : @plexus _______________________________________________ langsec-discuss mailing list [email protected] https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
