Re: about the supported input format of any23

armon Fri, 22 Jun 2012 02:27:09 -0700

Hi Lewis, 

I even as the xml data in a file, and then command: ./any23 rover @filepath 
，but it still can't work, finally,I create a simply xml data file to test, 
again nothing retrieved, so I think maybe it is not the url issue, but related 
with parser engine.


Is the any23 0.7 coming, will it meet my particular request? If so, then I just 
get the latest 0.7 and test it again.

thanks for your reply.

All the best!

armon.chen



On 2012年6月22日星期五 at 下午5:13, Lewis John Mcgibbney wrote:

> So I suppose there are a couple of options here.
> 
> On Fri, Jun 22, 2012 at 10:02 AM, armon <[email protected] 
> (mailto:[email protected])> wrote:
> > 
> > but we know that there is some other data in the page that can't be 
> > retrieved, such as the xml data (in the attachment of last email).
> 
> Yes there is a good bit more content but the parsing implementations
> within Any23 do not aim to extract content strings... instead the
> project (parsing anyway) gains its strength from extracting triples
> and such like.
> 
> You could quickly fire up a Nutch instance to gather content then use
> the basic-crawler from Any23 for triples... this is until we implement
> an Any23 parsing and indexing filter within Nutch which will provide a
> complete solution to your particular request.
> 
> You could easily implement the above programmatically which would
> enable you to fetch page content as well as extract the triples from
> it separately.

Re: about the supported input format of any23

Reply via email to