Hi Lewis,

On 14 February 2012 22:01, Lewis John Mcgibbney
<[email protected]>wrote:

> I think I've found it myself.
>
>
 This is the list of current Extractors:

http://incubator.apache.org/any23/extractors.html

No! we currently support the following Microformats:
>

> Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and
> Species


> Would it be beneficial to try and integrate the rel-tag microformat into
> Any23?
>

If you look at "readTextField(Node node)" method in HTMLDocument.java [1]
or "extractRelTag(String hrefAttributeContent)", you can see that rel-tag
is extracted
to build complex Microformats.
I'm not sure that rel-tag can be used alone (and the documentation tonight
is unreachable...).

Anyway the implementation of a dedicated Extractor should be easy.


>
> This way we could drop it from Nutch, integrate it into the Any23 microdata
> parser implementation and and just use it within the parse-any23 Nutch
> plugin.
>
> ???
>

Mic

[1]
https://svn.apache.org/repos/asf/incubator/any23/trunk/core/src/main/java/org/apache/any23/extractor/html/HTMLDocument.java


>
> On Tue, Feb 14, 2012 at 8:19 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi Guys,
> >
> > Do we maintain a parser for the above rel-tag [1] format?
> >
> > I'm doing some work with Nutch and wonder if when I write this plugin the
> > rel-tag one we maintain over the will be deprecated.
> >
> > Thanks
> >
> > Lewis
> >
> > [1] http://microformats.org/wiki/Rel-Tag
> >
> > --
> > *Lewis*
> >
> >
>
>
> --
> *Lewis*
>



-- 
Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
mail: [email protected]
site : http://www.michelemostarda.com

Reply via email to