Here's something I've been mulling for probably about eight years without doing anything about it.

Particularly in web applications - but in other areas too - people regularly make a complete mess of escaping / unescaping strings. At different times a string may need to be

* unescaped plain text
* SQL quoted
* entity encoded
* url encoded
* json encoded
* etc

It may even need to be encoded using some combination of those encodings - in which case the order in which they are applied to the string matters.

The continued popularity of XSS exploits, SQL injection bugs etc confirms that people still aren't getting it right. So it must be hard.

This evening I started playing with String::Smart (the name is as provisional as everything else). It lets you do

my $email = 'Andy Armstrong <[EMAIL PROTECTED]>';

my $enc = as html => $email;
print "$enc\n";
# Prints "Andy Armstrong &lt;[EMAIL PROTECTED]&gt"

Multiple encodings may be applied:

# Apply HTML entity encoding and then query encoding
# Currently multiple encodings are separated by '_'
my $for_query = as html_query => $email;

You can also assert that a string is already escaped in some way - although I don't have a clean syntax for that. So you could say

my $this = is_already html => '<p>A paragraph</p>';
my $that = '<Just a string>';

my $html_this = as html $this;
my $html_that = as html $that;

which would give you

$html_this = '<p>A paragraph</p>'; # No transformation - was already HTML
$html_that = '&lt;Just as string&gt;'; # Applied entity encoding

In general a String::Smart string knows which transformations are currently applied to a string and when you ask for a particular representation of that string it computes the path from the current encoding to the desired encoding and applies the transformations in the appropriate order.

So when you're generating a SQL query or a chunk of HTML you can just ask for the 'as sql' or 'as html' version of each string you use without worrying about how it's currently encoded.

Currently strings with a non-empty set of encodings turn into a blessed hashref that overloads stringification. I can't think of any other sensible way to associate the 'how is this currently encoded' metadata with a string. I'm open to suggestions.

What do people think? Useful?

http://imgs.xkcd.com/comics/exploits_of_a_mom.png

--
Andy Armstrong, Hexten

Reply via email to