Re: about the supported input format of any23

armon Fri, 22 Jun 2012 02:47:35 -0700

Hi Lewis,

 I DO agree with your opinion, yes, actually any23 do a great work,


 and I was used to think that it can support the common xml structure data 
while it doesn't.

 So it is ok, maybe I need to develop a new module to meet my requirement.

 And if there is anything make you misunderstand my real mean, I am sorry about 
that.

 I just ask you sincerely whether asny23 0.7 will support the common xml format 
as input or not.

 If not, it is ok, I will get other solution.

 Thank you very much!

 All the best! 

armon.chen


On 2012年6月22日星期五 at 下午5:35, Lewis John Mcgibbney wrote:

> Hi Armon,
> 
> I think we need to clarify something here
> 
> Any23 parsers extract structured data... the parsers DO NOT aim to
> extract unstructured text like some kind of 'traditional' parser.
> By structure we are not referring to markup as such but instead relate
> solely to semantic/structural relationships between concepts within
> some given data resource.
> Within the context of this thread, we refer (somewhat ambiguously) to
> resources as one of the following formats
> 
> RDF/XML, Turtle, Notation 3, RDFa with RDFa1.1 prefix mechanism,
> Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview,
> License, XFN and Species, HTML5 Microdata: (such as Schema.org 
> (http://Schema.org)), CSV:
> Comma Separated Values with separator autodetection.
> 
> Does this make sense?
> 
> The Any23 parser is doing it's job as it should.
> 
> Lewis
> 
> On Fri, Jun 22, 2012 at 10:26 AM, armon <[email protected] 
> (mailto:[email protected])> wrote:
> > Hi Lewis,
> > 
> > I even as the xml data in a file, and then command: ./any23 rover @filepath 
> > ，but it still can't work, finally,I create a simply xml data file to test, 
> > again nothing retrieved, so I think maybe it is not the url issue, but 
> > related with parser engine.
> > 
> > Is the any23 0.7 coming, will it meet my particular request? If so, then I 
> > just get the latest 0.7 and test it again.
> > 
> > thanks for your reply.
> > 
> > All the best!
> > 
> > armon.chen
> > 
> > 
> > 
> > On 2012年6月22日星期五 at 下午5:13, Lewis John Mcgibbney wrote:
> > 
> > > So I suppose there are a couple of options here.
> > > 
> > > On Fri, Jun 22, 2012 at 10:02 AM, armon <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > > 
> > > > but we know that there is some other data in the page that can't be 
> > > > retrieved, such as the xml data (in the attachment of last email).
> > > 
> > > Yes there is a good bit more content but the parsing implementations
> > > within Any23 do not aim to extract content strings... instead the
> > > project (parsing anyway) gains its strength from extracting triples
> > > and such like.
> > > 
> > > You could quickly fire up a Nutch instance to gather content then use
> > > the basic-crawler from Any23 for triples... this is until we implement
> > > an Any23 parsing and indexing filter within Nutch which will provide a
> > > complete solution to your particular request.
> > > 
> > > You could easily implement the above programmatically which would
> > > enable you to fetch page content as well as extract the triples from
> > > it separately.
> 
> 
> 
> -- 
> Lewis

Re: about the supported input format of any23

Reply via email to