On Thu, Dec 30, 2010 at 23:24, g4Ur4v <[email protected]> wrote:
> Is there a way to extract all the HTML pages from a website into a > single HTML/PDF file.For example,the Dive into HTML5 website > (www.diveintohtml5.org) contains the book in various html pages.Is > there a way to combine them all into a single file ? > You can mirror a site by using wget utility. <command> wget --continue --mirror --convert-links --backup-converted --page-requisites --timestamping --verbose --random-wait http://www.diveintohtml5.org/ </command> This will mirror the complete site in your current working directory. Moreover you can sync any update by just writing a cron job in crontab #if prefer this. 0 0 * * 0 wget --continue --mirror --convert-links --backup-converted --page-requisites --timestamping --verbose --random-wait http://www.diveintohtml5.org/ Arjun S R <[email protected]> College Of Engineering,Trivandrum <http://www.cet.ac.in/home.php> Facebook : http://www.facebook.com/Arjun.S.R Twitter: http://twitter.com/Arjun_S_R -- l...@iitd - http://tinyurl.com/ycueutm
