On Fri, Sep 16, 2011 at 12:06:37AM +1000, Webmaster wrote:
> I have 31,102 HTML documents. Each one contains one verse from the KJV Bible.
> 
> Currently, these files are named consecutively like this:
> 
> KJV.00001.html
> KJV.00002.html
> KJV.00003.html
> KJV.00004.html
> KJV.00005.html
> 
> The title tag in each HTML document contains the actual Bible reference,
> followed by a short description. For example, "KJV.00001.html" has
> "Genesis 1:1 - KJV ( King James Version ) Bible Verse" in the title tag.
> 
> I need a way to take just the reference part of the title tag, and make
> it the actual file name, so that "KJV.00001.html" becomes
> "Genesis_1-1.html". In other words, I want to drop the " - KJV ( King
> James Version ) Bible Verse" part of the title tag when I create the
> actual file names.
> 
> I want the book name to be followed by an underscore, instead of a space,
> and the colon in each verse reference will have to be replaced with a
> hyphen, since we cannot use the colon in file names on the Macintosh.

Here's a script to do this in Perl.  You can save it to a file, perhaps
named rename_bible_files.pl, and then run it on a directory, perhaps named
my_bible_files, like this:

  perl rename_bible_files.pl my_bible_files

To preview without actually renaming any files:

  perl rename_bible_files.pl -p my_bible_files

To verbosely list the renaming as it's done:

  perl rename_bible_files.pl -v my_bible_files


As written, it only processes files in the top level of the directory.  It
could be modified to descend into sub-directories if needed.

If it encounters an error opening the directory, opening a file, or
renaming a file, it will abort.  If it can't find a title in the file, or
finds the title but can't parse it, it will output a warning and continue
processing the remaining files.


#!perl

use strict;
use warnings;

use Getopt::Long;

local $/;

GetOptions("verbose" => \  my $verbose,
           "preview" => \  my $preview,
          )
  or die "Invalid options";

my $dir = shift
 or die "Must specify directory.\n";

opendir my $dh, $dir
  or die "Can't open directory $dir: $!\n";

while (defined(my $file = readdir $dh)) {
  next unless -f "$dir/$file" && $file =~ /\.html$/;

  open my $fh, '<', "$dir/$file"
    or die "Can't open $dir/$file: $!\n";

  my $contents = <$fh>;

  close $fh;

  my ($title) = $contents =~ m,<title[^>]*>\s*(.*?)\s*</title>,
    or do {
      warn "Could not find title in $dir/$file\n";
      next;
    };

  my ($book, $chapter, $verse) = $title =~ /^([\w ]*?)\s+(\d+):(\d+)/
    or do {
      warn "Could not parse title '$title' in $dir/$file\n";
      next;
    };

  $book =~ tr/ /_/;

  my $new = "$book\_$chapter-$verse.html";

  if ($new eq $file) {
    next;
  }

  if ($verbose || $preview) {
    print "$file => $new\n";
  }

  if (!$preview) {
    rename("$dir/$file", "$dir/$new")
      or die "Can't rename $dir/$file to $dir/$new: $!\n";
  }
}

__END__


Ronald

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem, 
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

Reply via email to