On 10/05/2006 09:48 AM, Chad Perrin wrote:
On Thu, Oct 05, 2006 at 09:06:11AM -0500, Mumia W. wrote:
On 10/05/2006 08:47 AM, Kevin Old wrote:
Hello everyone,

I have a set of web based admin tools that users in my company use to
update various pieces of a website.  I've never been able to write
enough regexes, "clean routines", etc. to clean out all of the "bad
characters" that users put in.  The big culprit is of course, good ole
cut and paste.

Like I said, I have several "sanitize" routines that clean control
characters, etc. out of the input fields.  Just wondering if others
have found "the solution" for stuff like this.
Perhaps you could look at the problem in reverse. Strip out all characters that are not in a certain set; e.g., you might take anything that is not a digit, space, tab, alphanumeric character, period, or comma and delete it.

That won't work so well for characters that are garbage versions of good
characters that are actually needed.  Generally, quotes are there for a
reason, for instance -- so just throwing away "smart quotes" rather than
replacing them with standard vertical ASCII quotes might not be
desirable.


You're right and figuring out what is truly garbage and what are garbled bytes that need to be converted is not trivial. Maybe there's a module on CPAN...




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to