Re: Proposed (optional) kwalitee metric; use re 'taint'

Paul Fenwick Tue, 24 Jun 2008 05:05:31 -0700

G'day David,

David Golden wrote:

Except "use re q/taint/" is lexical.  So if some module isn't itself
reading data from a potentially tainted source, then it really doesn't
need to bother with this.  That's not the same as strict and warnings,
which always apply to my code.

Ah! But this is not about reading data! Let's pretend we have thefollowing (naive) code in a module that I use:


        sub extract_sentence {
                my ($str) = @_;

                # Sentences end with a dot, followed by a space.
                my ($title) = $str =~ /(.*)\.\s/;

                return $title;
        }

However if I pass in a tainted string like:

        "It was dark; very dark.  The moon was hidden by clouds"

The subroutine returns an *UNTAINTED*:

        "It was dark; very dark."

Having just extracted the first sentence doesn't mean that it's been checkedfor safety in any way whatsoever. And here's the problem, as the developerUSING this module, I can't easily stop extract_sentence from untainting my data.

Note that this has nothing to do with *reading* data, and everything to dowith regular expressions. The example code above was not written withuntainting data in mind, and yet it untaints data by default.

I don't see why some subroutine N levels down the call stack in some
utility module should be expected to preserve taint on data you didn't
check when you received it.

For the same reasons I don't want some subroutine N levels down to overwrite$_, screw around with my $/, or make $@ mysteriously disappear or change.

If I had wanted my data untainted, I would have done it explicitly. Infact, I want to *keep* most of my data tainted, so I don't accidentally dosomething foolish with it.

I think I disagree with this. (Though perhaps could be argued out of
it.)  It seems to me that data should be validated at the time it is
collected and untainted once validated.

I agree with your first sentiment, but not your second. Data should bevalidated when you first collect it, but untainting does not necessarilyfollow from that. Just because I've read a valid hunk of HTML from awebpage doesn't mean that I'd ever intend to use that HTML as a filename,put it anywhere near a shell, or use it in any other taint-aware method Imay use[1].


This is a hard problem, which can be summarised as:

        * Perl provides a very good mechanism for tracking "untrusted"
          data.

        * Not many people use that mechanism, or even think about it.

        * One of Perl's most commonly used language features (regexps)
          marks data as trusted by default.

        * Therefore, most code that uses regexps will untaint by accident.

Unfortunately, adding a metric to CPANTS is going to solve it, although itwould certainly increase awareness of the issue. Changing how Perl untaintsdata isn't possible, since we'll break old code, and it won't help olderPerls anyway.

Having a module that changes the default behaviour and can be loaded into anapplication that cares probably will work[2], and is much more dependablethan relying upon every CPAN author to have been doing the right thing forsomething they may never use.

David, chromatic, thank-you both for letting me bounce ideas off you, Ireally do appreciate it a great deal.


Many thanks!

        Paul

[1] DBI, for example, provides a lovely interface for checking data isuntainted before being used in a statement.


[2] For some definition of "work".

--
Paul Fenwick <[EMAIL PROTECTED]> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

Re: Proposed (optional) kwalitee metric; use re 'taint'

Reply via email to