I've spent a few days putting together a little application that generates a database of RFCs. It uses XML::Parser to generate the database, and Class::DBI and DBD::SQLite to store and search the data.
I've been calling it RFC::Index. It possibly falls more under the category of "application" than library, though the usage I had in mind for it was as a supplemental tool for a few other projects I have in the pipeline. I've attached the POD for the two main modules. My questions: Is this something that should/could go onto CPAN? And, if so, is the name appropriate? I realize that there is no RFC toplevel namespace right now, but that seems to be the most appropriate place for it. (darren) -- When it is dark enough, you can see the stars. -- Ralph Waldo Emerson
NAME RFC::Index - Create and maintain a local searchable RFC database. SYNOPSIS use RFC::Index '/usr/local/share/rfc.db'; my $rfc = RFC::Index->retrieve("RFC2822"); my @rfc = RFC::Index->search("HTTP", "MIME"); DESCRIPTION "RFC::Index" is used to create and maintain a local searchable RFC database. It consists of three elements: a parser to parse the rfc-index.xml file (in the root of the RFC directory, see "ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml"); a search frontend ("RFC::Index"); and a set of classes that implement the RFC dataelements ("RFC::Entry", "RFC::Author", "RFC::Keyword", and so on). "RFC::Index" uses "Class::DBI" and "DBD::SQLite" under the hood. It is possible that "RFC::Index" will work just fine with RDBMS other than SQLite, but I have not tested it with anything other than SQLite. USAGE There are two main search methods available from the "RFC::Index" class: "retrieve", to retrieve an individual RFC by number, and "search", for searching for RFCs by keyword. "retrieve" returns an "RFC::Entry" object or undef, while "search" returns a (possibly empty) array of "RFC::Entry" objects. Import "RFC::Index" with the path the to database to use: use RFC::Index '/usr/local/share/rfc.db'; or explicitly call "import" with the path to the database: use RFC::Index; RFC::Index->import('/usr/local/share/rfc.db'); If you are using a DBD other than SQLite, you can pass any other arguments to "import"; they will be blindly passed on the "set_db" call: use RFC::Index 'dbi:mysql:RFC:dbhost', 'rfcuser', 'rfcpass', { RaiseError => 0 }; retrieve Call "retrieve" with the number of an RFC: my $rfc = RFC::Index->retrieve(2822); The number can optionally be prefixed with "RFC", which is how they are referred to in the RFC indexes: my $rfc = RFC::Index->retrieve("RFC2822"); search The RFC index includes a number of keywords, and the "search" method provides a way to get RFCs based on these keywords. No stemming of search terms is performed, however, at least not at this time. To generate a new index, or update an existing index, use the "reindex" method of the "RFC::Index" class: use RFC::Index '/usr/local/share/rfc.db'; use LWP::Simple; mirror "ftp//www.rfc-editor.org/rfc/rfc-index.xml" => "rfc-index.xml"; RFC::Index->reindex("rfc-index.xml"); The database specified to "RFC::Index::import" will be populated. This process may take a while, depending on the speed of your machine; on my lightly loaded 1GHz PIII (1G RAM) it takes about 15 minutes to run. Multiple runs do not create duplicate entries in the database; the parser is designed to be run on a regular basis, to keep the index up to date. TODO / BUGS Support for BCP and STD types Currently Best Current Practice entries and Internet Standards are not supported. This is a bug of omission. Test on non-SQLite databases Theoretically, "RFC::Index" should run just fine on non-SQLites. SUPPORT "RFC::Index" is supported by the author. VERSION This is "RFC::Index", revision $Revision: 1.2 $. AUTHOR darren chamberlain <[EMAIL PROTECTED]> COPYRIGHT (C) 2004 darren chamberlain This library is free software; you may distribute it and/or modify it under the same terms as Perl itself. SEE ALSO Perl, RFC::Entry, Class::DBI, DBD::SQLite, Set::Scalar
NAME RFC::Entry - An RFC SYNOPSIS use RFC::Index; my $rfc822 = RFC::Index->retrieve("rfc822"); my @mail_rfcs = RFC::Index->search("mail"); DESCRIPTION RFC searches using "RFC::Index" return instances of the "RFC::Entry" class. Each instance supports a number of Useful Methods, which provide access to data gleaned from the index. These Useful Methods include: title The title of the RFC. abstract A short abstract of the RFC, if it exists in the index. date The date of the RFC, as a "Time::Piece" instance. current_status, publication_status The status of the RFC. notes Any notes attached to the RFC. uri The URI of a version of the RFC. By default, this URI will be rooted at "ftp://ftp.rfc-editor.org/in-notes", though a new base URI can be passed as an argument to "uri": print $rfc->uri("http://localhost/mirrors/rfc"); page_count The number of pages in the document. char_count The number of characters in the document. file_format The format of the document. authors Returns an array (or reference to an array, in scalar context) of "RFC::Author" objects. These objects have the following methods: name The name of the author, in the first initial-last name format used in the index. title The title of the author, as listed in the index. organization, org_abbrev The organization and its abbreviation, as listed in the index. An "RFC::Author" instance stringifies to the value of the "name" method: my @authors = $rfc->authors; my $last_author = pop @authors; my $authors = join " and ", join(", ", @authors), $last_author; print $rfc->doc_id, " was authored by $authors.\n"; keywords Returns an array (or iterator) of keywords attached to the RFC. obsoletes, obsoleted_by Returns a list of other RFCs that obsolete or are obsoleted by the current RFC. For example, RFC822 obsoletes RFC733, and is obsoleted by RFC2822. updates, updated_by Returns a list of other RFCs that update or are updated by the current RFC. as_xml Call the "as_xml" method to have the entry returned as a string of XML. This method reconstucts the original index entry, down to the indentation. TODO Non <rfc-entry> elements are not currently supported. This includes the Best Current Practices (*BCP*) entries and Standards. SUPPORT "RFC::Entry" is supported by the author. VERSION This is "RFC::Entry", revision $Revision: 1.3 $. AUTHOR darren chamberlain <[EMAIL PROTECTED]> COPYRIGHT (C) 2004 darren chamberlain This library is free software; you may distribute it and/or modify it under the same terms as Perl itself. SEE ALSO Perl, RFC::Index, URI, Time::Piece
pgp00000.pgp
Description: PGP signature