[ https://issues.apache.org/jira/browse/NUTCH-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420214#comment-16420214 ]
ASF GitHub Bot commented on NUTCH-2545: --------------------------------------- HansBrende commented on issue #306: NUTCH-2545 Upgrade to Any23 2.2 URL: https://github.com/apache/nutch/pull/306#issuecomment-377457712 @lewismc that's fine, but beware: the SAX pre-processor actually screws up a lot of triples! In the microdata test, that 39th triple extracted with the SAX pre-processor is actually a *bug* (see ANY23-340). Also, the SAX pre-processor removes all the namespaces specified in the html element. Which means that, even though the BBC test file specified the namespaces xmlns:og="http://opengraphprotocol.org/schema/" and xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#", the extractors ignored those, defaulting to "http://ogp.me/ns#" for the "og" namespace, and "rnews:" for the "rnews" namespace. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade to Any23 2.2 > -------------------- > > Key: NUTCH-2545 > URL: https://issues.apache.org/jira/browse/NUTCH-2545 > Project: Nutch > Issue Type: Improvement > Components: any23, plugin > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Priority: Minor > Fix For: 1.15 > > > We recently released Any23 2.2. I would like to update the Any23 plugin to > this newest version. > PR coming up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)