[ 
https://issues.apache.org/jira/browse/NUTCH-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420214#comment-16420214
 ] 

ASF GitHub Bot commented on NUTCH-2545:
---------------------------------------

HansBrende commented on issue #306: NUTCH-2545 Upgrade to Any23 2.2
URL: https://github.com/apache/nutch/pull/306#issuecomment-377457712
 
 
   @lewismc that's fine, but beware: the SAX pre-processor actually screws up a 
lot of triples! 
   
   In the microdata test, that 39th triple extracted with the SAX pre-processor 
is actually a *bug* (see ANY23-340). 
   
   Also, the SAX pre-processor removes all the namespaces specified in the html 
element. Which means that, even though the BBC test file specified the 
namespaces xmlns:og="http://opengraphprotocol.org/schema/"; and  
xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#";, the extractors ignored 
those, defaulting to "http://ogp.me/ns#"; for the "og" namespace, and "rnews:"; 
for the "rnews" namespace.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade to Any23 2.2
> --------------------
>
>                 Key: NUTCH-2545
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2545
>             Project: Nutch
>          Issue Type: Improvement
>          Components: any23, plugin
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.15
>
>
> We recently released Any23 2.2. I would like to update the Any23 plugin to 
> this newest version.
> PR coming up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to