I have written a new module, WWW::Patent::Page, and propose to submit it to CPAN. Your comments would be appreciated.
Does the name seem reasonable? I am happy to take suggestions. I think it is reasonable to have a "Patent" namespace in WWW, since much patent information is available on the WWW. For example, searches of the prior art, patent family relationships, patent applications via XML, etc. With a namespace, related modules may be grouped easily. One can imagine future modules like "WWW::Patent::Apply", WWW::Patent::Family", or WWW::Patent::Search" for interacting with various web services. WWW::Patent::Page is alpha software- my first module, and my intent is to see if the perl community has any interest in the idea. It is rough around the edges, but passes what tests it has. The module provides a consistent way to obtain pages of patent documents from various patent offices that make them available on the WWW. Typically, doing this is relatively easy by hand, page by page, but takes a bit of work if you want to do automate it effectively for many pages or documents. The offices typically make it hard to get the whole document, presumably because supplying that is one source of revenue. >From this primitive module, users can stitch together tiff or PDF into multipage documents by whatever method they prefer. The module uses submodules, specific to separate patent offices, and comes with working examples for the USPTO and EPO, which between them supply granted patents in html and tiff (USPTO) and pdf (US, EP, and much of the world...). Hopefully, other interested users will create new or improved submodules and feed them back into the distribution. For casual users, this module should simplify life. Abusive users will likely find their IP address banned by the patent office being spidered. Here is the documentation as it now stands: NAME WWW::Patent::Page - retrieve a patent page (e.g. from United States Patent and Trademark Office (USPTO) website or the European Patent Office (ESPACE_EP). ) SYNOPSIS Please see the test suite for working examples. The following is not guaranteed to be working or up-to-date. use WWW::Patent::Page; my $patent_document = WWW::Patent::Page->new(); # new object my $document1 = $patent_document->provide_doc('6,123,456'); # defaults: office => 'USPTO', # country => 'US', # format => 'htm', # page => '1', # typically htm IS "1" page # modules => qw/ us ep / , my $document2 = $patent_document->provide_doc('US_6_123_456', office => 'ESPACE_EP' , format => 'tif', page => 2 , ); my $pages_known = $patent_document->pages_available( # e.g. TIFF document=> '6 123 456', ); DESCRIPTION Intent: Use public sources to retrieve patent documents such as TIFF images of patent pages, html of patents, pdf, etc. Expandable for your office of interest by writing new submodules.. Alpha release by newbie to find if there is any interest USAGE See also SYNOPSIS above Standard process for building & installing modules: perl Build.PL ./Build ./Build test ./Build install Examples of use: $patent_document = WWW::Patent::Page->new( doc_id => 'US6,654,321(B2)issued_2_Okada', office => 'ESPACE_EP' , format => 'tif', page => 2 , agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6', ); # 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)', # 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6', # 'Mac Safari' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/85 (KHTML, like Gecko) Safari/85', # 'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401', # 'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624', # 'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', my %attributes = $patent_document->get_patent('all'); # hash of all my $document_id = $patent_document->get_patent('doc_id'); # US6,654,321(B2)issued_2_Okada my $office_used = $patent_document->get_patent('office'); # ep my $country_used = $patent_document->get_patent('country'); #US my $doc_id_used = $patent_document->get_patent('doc_id'); # 6654321 my $page_used = $patent_document->get_patent('page'); # 2 my $kind_used = $patent_document->get_patent('kind'); # B2 my $comment_used = $patent_document->get_patent('comment'); # issued_2_Okada my $format_used = $patent_document->get_patent('format'); #tif my $pages_total = $patent_document->get_patent('pages_available'); # 101 my $terms_and_conditions = $patent_document->terms('us'); # and conditions my $document = $patent_document->get_patent('document'); # the loot BUGS Pre-alpha release, to gauge whether the perl community has any interest. Code contributions, suggestions, and critiques are welcome. Error handling is undeveloped. By definition, a non-trivial program contains bugs. For United States Patents (US) via the USPTO (us), the 'kind' is ignored in method provide_doc SUPPORT Yes, please. Checks are best. Or email me at [EMAIL PROTECTED] to arrange fund transfers. AUTHOR Wanda B. Anon [EMAIL PROTECTED] COPYRIGHT This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. ACKNOWLEDGEMENTS Andy Lester for WWW::Mechanize, that got me thinking, The authors of Finance::Quote, which served as an example of providing submodules, Erik Oliver for patentmailer, serving as an example of getting patent documents, Howard P. Katseff of AT&T Laboratories for wsp.pl, version 2, a proxy that speaks LWP and understands proxies, and of course Larry and Randal and the gang. SEE ALSO perl(1). Subroutine _countries_known() Usage : internal method only Purpose : list all entities that could give a patent Returns : ref to a hash with keys of abbreviations and values of entities (usually a country) ... __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com