On Fri, Sep 16, 2011 at 12:06:37AM +1000, Webmaster wrote:
> I have 31,102 HTML documents. Each one contains one verse from the KJV Bible.
>
> Currently, these files are named consecutively like this:
>
> KJV.00001.html
> KJV.00002.html
> KJV.00003.html
> KJV.00004.html
> KJV.00005.html
>
> The title tag in each HTML document contains the actual Bible reference,
> followed by a short description. For example, "KJV.00001.html" has
> "Genesis 1:1 - KJV ( King James Version ) Bible Verse" in the title tag.
>
> I need a way to take just the reference part of the title tag, and make
> it the actual file name, so that "KJV.00001.html" becomes
> "Genesis_1-1.html". In other words, I want to drop the " - KJV ( King
> James Version ) Bible Verse" part of the title tag when I create the
> actual file names.
>
> I want the book name to be followed by an underscore, instead of a space,
> and the colon in each verse reference will have to be replaced with a
> hyphen, since we cannot use the colon in file names on the Macintosh.
Here's a script to do this in Perl. You can save it to a file, perhaps
named rename_bible_files.pl, and then run it on a directory, perhaps named
my_bible_files, like this:
perl rename_bible_files.pl my_bible_files
To preview without actually renaming any files:
perl rename_bible_files.pl -p my_bible_files
To verbosely list the renaming as it's done:
perl rename_bible_files.pl -v my_bible_files
As written, it only processes files in the top level of the directory. It
could be modified to descend into sub-directories if needed.
If it encounters an error opening the directory, opening a file, or
renaming a file, it will abort. If it can't find a title in the file, or
finds the title but can't parse it, it will output a warning and continue
processing the remaining files.
#!perl
use strict;
use warnings;
use Getopt::Long;
local $/;
GetOptions("verbose" => \ my $verbose,
"preview" => \ my $preview,
)
or die "Invalid options";
my $dir = shift
or die "Must specify directory.\n";
opendir my $dh, $dir
or die "Can't open directory $dir: $!\n";
while (defined(my $file = readdir $dh)) {
next unless -f "$dir/$file" && $file =~ /\.html$/;
open my $fh, '<', "$dir/$file"
or die "Can't open $dir/$file: $!\n";
my $contents = <$fh>;
close $fh;
my ($title) = $contents =~ m,<title[^>]*>\s*(.*?)\s*</title>,
or do {
warn "Could not find title in $dir/$file\n";
next;
};
my ($book, $chapter, $verse) = $title =~ /^([\w ]*?)\s+(\d+):(\d+)/
or do {
warn "Could not parse title '$title' in $dir/$file\n";
next;
};
$book =~ tr/ /_/;
my $new = "$book\_$chapter-$verse.html";
if ($new eq $file) {
next;
}
if ($verbose || $preview) {
print "$file => $new\n";
}
if (!$preview) {
rename("$dir/$file", "$dir/$new")
or die "Can't rename $dir/$file to $dir/$new: $!\n";
}
}
__END__
Ronald
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem,
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>