I've been told that copies of my new book /Perl & LWP/ are now madly in transit to bookstores all over.
You can even get it now from Amazon, on sale, and they'll ship it once they realize that those huge crates labelled "O'REILLY | BURKE | PERL AND LWP" don't contain John Grisholm novels in Portugese. For most folks, you can get it here: http://www.amazon.com/exec/obidos/ASIN/0596001789 (USD$25 instead of USD$35). Or for people in/near Europe, see Amazon UK: http://www.amazon.co.uk/exec/obidos/ASIN/0596001789 (on sale for 25 UK pounds instead of 20 UK pounds -- that is, 31 euros instead of 39 euros) I think that work on translations into other popular languages will start soon; but just now, the book is available only in English. The book is about 280 pages (the US Amazon data still mistakenly says 400 pages, until they rekey the data), and has lots of code examples. Here's the table of contents: PERL AND LWP Foreword -- ix Preface -- xi 1. Introduction to Web Automation -- 1 The Web as Data Source -- 1 History of LWP -- 3 Installing LWP -- 4 Words of Caution -- 9 LWP in Action -- 10 2. Web Basics -- 15 URLs -- 15 An HTTP Transaction -- 17 LWP::Simple -- 19 Fetching Documents Without LWP::Simple -- 24 Example: Altavista -- 25 HTTP POST -- 27 Example: Babelfish -- 28 3. The LWP Class Model -- 31 The Basic Classes -- 31 Programming with LWP Classes -- 32 Inside the do_GET and do_POST Functions -- 33 User Agents -- 34 HTTP::Response Objects -- 42 LWP Classes: Behind the Scenes -- 47 4. URLs -- 48 Parsing URLs -- 48 Relative URLs -- 54 Converting Absolute URLs to Relative -- 55 Converting Relative URLs to Absolute -- 56 5. Forms -- 58 Elements of an HTML Form -- 59 LWP and GET Requests -- 59 Automating Form Analysis -- 62 Idiosyncrasies of HTML Forms -- 64 POST Example: License Plates -- 70 POST Example: ABEBooks.com -- 74 File Uploads -- 81 Limits on Forms -- 84 6. Simple HTML Processing with Regular Expressions -- 85 Automating Data Extraction -- 85 Regular Expression Techniques -- 87 Troubleshooting -- 91 When Regular Expressions Aren't Enough -- 93 Example: Extracting Linksfrom a Bookmark File -- 93 Example: Extracting Linksfrom Arbitrary HTML -- 96 Example: Extracting Temperatures from Weather Underground -- 98 7. HTML Processing with Tokens -- 100 HTML as Tokens -- 100 Basic HTML::TokeParser Use -- 101 Individual Tokens -- 105 Token Sequences -- 107 More HTML::TokeParser Methods -- 112 Using Extracted Text -- 117 8. Tokenizing Walkthrough -- 119 The Problem -- 119 Getting the Data -- 120 Inspecting the HTML -- 121 First Code -- 122 Narrowing In -- 123 Rewrite for Features -- 125 Alternatives -- 131 9. HTML Processing with Trees -- 132 Introduction to Trees -- 132 HTML::TreeBuilder -- 133 Processing -- 137 Example: BBC News -- 142 Example: Fresh Air -- 145 10. Modifying HTML with Trees -- 148 Changing Attributes -- 148 Deleting Images -- 152 Detaching and Reattaching -- 153 Attaching in Another Tree -- 156 Creating New Elements -- 161 11. Cookies, Authentication, and Advanced Requests -- 165 Cookies -- 165 Adding Extra Request Header Lines -- 169 Authentication -- 172 An HTTP Authentication Example: The Unicode Mailing Archive -- 175 12. Spiders -- 178 Types of Web-Querying Programs -- 178 A User Agent for Robots -- 180 Example: A Link-Checking Spider -- 181 Ideas for Further Expansion -- 197 Appendices: A. LWP Modules -- 199 B. HTTP Status Codes -- 203 C. Common MIME Types -- 205 D. Language Tags -- 207 E. Common Content Encodings -- 209 F. ASCII Table -- 211 G. User's View of Object-Oriented Modules -- 224 Index -- 235 -- Sean M. Burke http://www.spinn.net/~sburke/
