Here's something I've been mulling for probably about eight years
without doing anything about it.
Particularly in web applications - but in other areas too - people
regularly make a complete mess of escaping / unescaping strings. At
different times a string may need to be
* unescaped plain text
* SQL quoted
* entity encoded
* url encoded
* json encoded
* etc
It may even need to be encoded using some combination of those
encodings - in which case the order in which they are applied to the
string matters.
The continued popularity of XSS exploits, SQL injection bugs etc
confirms that people still aren't getting it right. So it must be hard.
This evening I started playing with String::Smart (the name is as
provisional as everything else). It lets you do
my $email = 'Andy Armstrong <[EMAIL PROTECTED]>';
my $enc = as html => $email;
print "$enc\n";
# Prints "Andy Armstrong <[EMAIL PROTECTED]>"
Multiple encodings may be applied:
# Apply HTML entity encoding and then query encoding
# Currently multiple encodings are separated by '_'
my $for_query = as html_query => $email;
You can also assert that a string is already escaped in some way -
although I don't have a clean syntax for that. So you could say
my $this = is_already html => '<p>A paragraph</p>';
my $that = '<Just a string>';
my $html_this = as html $this;
my $html_that = as html $that;
which would give you
$html_this = '<p>A paragraph</p>'; # No transformation - was already
HTML
$html_that = '<Just as string>'; # Applied entity encoding
In general a String::Smart string knows which transformations are
currently applied to a string and when you ask for a particular
representation of that string it computes the path from the current
encoding to the desired encoding and applies the transformations in
the appropriate order.
So when you're generating a SQL query or a chunk of HTML you can just
ask for the 'as sql' or 'as html' version of each string you use
without worrying about how it's currently encoded.
Currently strings with a non-empty set of encodings turn into a
blessed hashref that overloads stringification. I can't think of any
other sensible way to associate the 'how is this currently encoded'
metadata with a string. I'm open to suggestions.
What do people think? Useful?
http://imgs.xkcd.com/comics/exploits_of_a_mom.png
--
Andy Armstrong, Hexten