On Wed, Jan 7, 2009 at 07:41, Anže Vidmar <anz...@gmail.com> wrote: > hello! > > I have some nasty, non-ascii character in some files that contains php code > (actually somewhere in my SVN branch). What I want to do here is to > recursively find all the files that contains a specific non-ascii character > in the file. And most importantly - i need to know the name of the files > containing it. > > So far, I found a script that looks into a file for non-ascii characters and > prints this characters in hex: > > while (<>) { > s/([\x80-\xff])/sprintf "\\x{%02x}",ord($1)/eg; > print; > } > > Ok, this is good, the non-ascii character (in hex) that I'm looking for is: > > x{ef}\\x{bb}\\x{bf} > > The problem here is that I can't run this script to run recursively and I > don't get the name of the file that actually contains this characters. > > I've tried with bash, but since it's standard output, I can't get any > resault on this. Here is what I've tried: > > find |xargs /usr/local/bin/check_for_non-ascii_characters.sh |grep -l > 'x{ef}\\x{bb}\\x{bf}' > > So, I need a way to recursively find non-ascii characters (a specific > pattern, mentioned before) in all files and I need the name of the files > containing it. > > It would be enough if I would be able only to see what file contains this > character set. > > Thanks
#!/usr/bin/perl use strict; use warnings; use File::Find; File::Find::find( sub { return unless -f; #refine further with a return unless /\.php$/ if desired open my $fh, "<", $_ or die "could not open $_"; while (<$fh>) { my $offset = 0; for my $char (split //) { if (ord $char > 127) { printf "non-ascii char (%04x) in file %s on line %d position %d:\n%s\n", ord($char), $File::Find::name, $., $offset, $_; } $offset++; } } }, @ARGV ); -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read.