Hi,

I've just started trying to use Any23 programmatically from Java, and it looks 
great.
The documentation has sample code [1], but that code seems out-of-date (the 
webpage it attempts to extract from 
(http://www.rentalinrome.com/semanticloft/semanticloft.htm) has changed I 
think), and it has a syntax error (the word 'Apache' appears twice on line 1, 
which doesn't make any sense).

My questions are simply:

1.      How do I configure the 'Any23' instance in this code? I know the 
constructor takes a Properties instance, but where are the currently supported 
properties documented? For instance, how do I set the timeout for the 
connection attempt?

2.      This code sample doesn't seem to crawl from the webpage I provide - it 
just scans that one page. So is there a code sample for crawling a website 
(with code to show how to configure the MaxPages and MaxDepth)?

Thanks,

Pat.

[1] - http://any23.apache.org/dev-data-extraction.html


[cid:[email protected]]

Pat McBennett
Architect
The Chase Building, 5th Floor
Carmanhall Road, Sandyford,
Dublin 18, Ireland
Direct +353 1
Mobile +353 8

http://www.dnb.co.uk/

[cid:[email protected]]<http://www.facebook.com/DunBradstreet>[cid:[email protected]]<http://twitter.com/dnbus>[cid:[email protected]]<http://www.linkedin.com/company/dun-&-bradstreet>[cid:[email protected]]<http://www.youtube.com/user/DunandBrad>
[cid:[email protected]]

The information contained in this electronic message and any attachments (the 
"Message") is intended for one or more specific individuals or entities, and 
may be confidential, proprietary, privileged or otherwise protected by law. If 
you are not the intended recipient (or you are not authorised to receive for 
the recipient), please notify the sender immediately, delete this Message and 
do not disclose, distribute, or copy it to any third party or otherwise use 
this Message. Electronic messages are not secure or error free and can contain 
viruses or may be delayed and the sender is not liable for any of these 
occurrences. The sender reserves the right to monitor, record, transfer cross 
border and retain electronic messages.
"D&B" is a trading style of D&B Business Information Solutions is registered in 
Ireland. www.dnb.co.uk


Reply via email to