[ https://issues.apache.org/jira/browse/ANY23-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420756#comment-16420756 ]
ASF GitHub Bot commented on ANY23-339: -------------------------------------- Github user lewismc commented on the issue: https://github.com/apache/any23/pull/67 I haven't read The Reality Dysfunction... but the PR look good :) > Microdata extractor can sometime merge two different itemscopes into one > ------------------------------------------------------------------------ > > Key: ANY23-339 > URL: https://issues.apache.org/jira/browse/ANY23-339 > Project: Apache Any23 > Issue Type: Bug > Components: extractors > Affects Versions: 2.2 > Reporter: Hans Brende > Priority: Major > Fix For: 2.3 > > > The microdata extractor calculates the *subject* of a triple as the > *hashCode()* of the itemscope. > Java's hashCode() method (returning a 32-bit integer) is not guaranteed to be > collision-free. (Especially so in this case, since the ItemScope.hashCode() > method is not written very well). > This means that two microdata items can accidentally be merged into one. > Here's the line that needs to be changed: > [https://github.com/apache/any23/blob/316b4ec0d6285a204789792084caf012c000b196/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataExtractor.java#L439] > I recommend changing > {code} > subject = RDFUtils.getBNode(Integer.toString(itemScope.hashCode())); > {code} > to > {code} > subject = RDFUtils.bnode(); > {code} > We could also use {{itemScope.getItemId()}} if it's not null, even if it's > not a URL. An example of one such id possible is: > {code} > urn:isbn:0-330-34032-8 > {code} > Edit: according to the [microdata > spec|https://www.w3.org/TR/microdata-rdf/#dfn-absolute-url], > {{urn:isbn:0-330-34032-8}} *is* an absolute URL. Since their definition of > URL seems to correspond more closely to our definition of URI, we should be > checking for absolute urls with {{URI.isAbsolute()}} rather than with > {{URL.getProtocol() != null}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)