G'day David,

David Golden wrote:

Except "use re q/taint/" is lexical.  So if some module isn't itself
reading data from a potentially tainted source, then it really doesn't
need to bother with this.  That's not the same as strict and warnings,
which always apply to my code.

Ah! But this is not about reading data! Let's pretend we have the following (naive) code in a module that I use:

        sub extract_sentence {
                my ($str) = @_;

                # Sentences end with a dot, followed by a space.
                my ($title) = $str =~ /(.*)\.\s/;

                return $title;
        }

However if I pass in a tainted string like:

        "It was dark; very dark.  The moon was hidden by clouds"

The subroutine returns an *UNTAINTED*:

        "It was dark; very dark."

Having just extracted the first sentence doesn't mean that it's been checked for safety in any way whatsoever. And here's the problem, as the developer USING this module, I can't easily stop extract_sentence from untainting my data.

Note that this has nothing to do with *reading* data, and everything to do with regular expressions. The example code above was not written with untainting data in mind, and yet it untaints data by default.

I don't see why some subroutine N levels down the call stack in some
utility module should be expected to preserve taint on data you didn't
check when you received it.

For the same reasons I don't want some subroutine N levels down to overwrite $_, screw around with my $/, or make $@ mysteriously disappear or change.

If I had wanted my data untainted, I would have done it explicitly. In fact, I want to *keep* most of my data tainted, so I don't accidentally do something foolish with it.

I think I disagree with this. (Though perhaps could be argued out of
it.)  It seems to me that data should be validated at the time it is
collected and untainted once validated.

I agree with your first sentiment, but not your second. Data should be validated when you first collect it, but untainting does not necessarily follow from that. Just because I've read a valid hunk of HTML from a webpage doesn't mean that I'd ever intend to use that HTML as a filename, put it anywhere near a shell, or use it in any other taint-aware method I may use[1].

This is a hard problem, which can be summarised as:

        * Perl provides a very good mechanism for tracking "untrusted"
          data.

        * Not many people use that mechanism, or even think about it.

        * One of Perl's most commonly used language features (regexps)
          marks data as trusted by default.

        * Therefore, most code that uses regexps will untaint by accident.

Unfortunately, adding a metric to CPANTS is going to solve it, although it would certainly increase awareness of the issue. Changing how Perl untaints data isn't possible, since we'll break old code, and it won't help older Perls anyway.

Having a module that changes the default behaviour and can be loaded into an application that cares probably will work[2], and is much more dependable than relying upon every CPAN author to have been doing the right thing for something they may never use.

David, chromatic, thank-you both for letting me bounce ideas off you, I really do appreciate it a great deal.

Many thanks!

        Paul

[1] DBI, for example, provides a lovely interface for checking data is untainted before being used in a statement.

[2] For some definition of "work".

--
Paul Fenwick <[EMAIL PROTECTED]> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

Reply via email to