Hi Lewis, I even as the xml data in a file, and then command: ./any23 rover @filepath ,but it still can't work, finally,I create a simply xml data file to test, again nothing retrieved, so I think maybe it is not the url issue, but related with parser engine.
Is the any23 0.7 coming, will it meet my particular request? If so, then I just get the latest 0.7 and test it again. thanks for your reply. All the best! armon.chen On 2012年6月22日星期五 at 下午5:13, Lewis John Mcgibbney wrote: > So I suppose there are a couple of options here. > > On Fri, Jun 22, 2012 at 10:02 AM, armon <[email protected] > (mailto:[email protected])> wrote: > > > > but we know that there is some other data in the page that can't be > > retrieved, such as the xml data (in the attachment of last email). > > Yes there is a good bit more content but the parsing implementations > within Any23 do not aim to extract content strings... instead the > project (parsing anyway) gains its strength from extracting triples > and such like. > > You could quickly fire up a Nutch instance to gather content then use > the basic-crawler from Any23 for triples... this is until we implement > an Any23 parsing and indexing filter within Nutch which will provide a > complete solution to your particular request. > > You could easily implement the above programmatically which would > enable you to fetch page content as well as extract the triples from > it separately.
