Hi Lewis, On 14 February 2012 22:01, Lewis John Mcgibbney <[email protected]>wrote:
> I think I've found it myself. > > This is the list of current Extractors: http://incubator.apache.org/any23/extractors.html No! we currently support the following Microformats: > > Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and > Species > Would it be beneficial to try and integrate the rel-tag microformat into > Any23? > If you look at "readTextField(Node node)" method in HTMLDocument.java [1] or "extractRelTag(String hrefAttributeContent)", you can see that rel-tag is extracted to build complex Microformats. I'm not sure that rel-tag can be used alone (and the documentation tonight is unreachable...). Anyway the implementation of a dedicated Extractor should be easy. > > This way we could drop it from Nutch, integrate it into the Any23 microdata > parser implementation and and just use it within the parse-any23 Nutch > plugin. > > ??? > Mic [1] https://svn.apache.org/repos/asf/incubator/any23/trunk/core/src/main/java/org/apache/any23/extractor/html/HTMLDocument.java > > On Tue, Feb 14, 2012 at 8:19 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi Guys, > > > > Do we maintain a parser for the above rel-tag [1] format? > > > > I'm doing some work with Nutch and wonder if when I write this plugin the > > rel-tag one we maintain over the will be deprecated. > > > > Thanks > > > > Lewis > > > > [1] http://microformats.org/wiki/Rel-Tag > > > > -- > > *Lewis* > > > > > > > -- > *Lewis* > -- Michele Mostarda Senior Software Engineer skype: michele.mostarda twitter: micmos mail: [email protected] site : http://www.michelemostarda.com
