Hi All,
I have some starting Nutch questions that I am hoping to gain insight
about.
I want to start at Dmoz.org and follow links for entertainment (like
concerts, art gallery events, etc) and examine the link to see if I
should get data back about it and from it.
My questions:
1. Can Nutch start at a given URL and examine every link (based upon
my criteria)? (obviously I can write Case or If/Else or While to do
this)
2. If I find a link that has certain keywords that I find of interest,
can I hit that link of interest and get information from that page?
3. How do I get the information about the link of interest and its
content of interest into a MySQL database? (I know ColdFusion and
MySQL and PHP). I think what I am asking is how do I get back to my
database from a crawler?
4. As I know Nutch is Java, which is fine, I will need Tomcat running
etc. Are there other java App Servers out there as well for OS X?
5. Does anyone have deployment instructions for OS X?
Am I making any sense?
-Jason