[
https://issues.apache.org/jira/browse/ANY23-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420720#comment-16420720
]
ASF GitHub Bot commented on ANY23-339:
--------------------------------------
GitHub user HansBrende opened a pull request:
https://github.com/apache/any23/pull/67
ANY23-339 fixes itemscope hashcode collision problem, allows absolute URIs
as subjects
I fixed the itemscope hashcode collision problem documented in ANY23-339,
as well as loosening the restriction on subject resources to allow absolute
URIs as subjects.
mvn clean test --> all tests pass
@lewismc Any comments before I merge this?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HansBrende/any23 ANY23-339
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/67.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #67
----
commit a1b72b720a2cdb2802fd8e82856ee67702d002cd
Author: Hans <firedrake93@...>
Date: 2018-03-30T17:04:25Z
ANY23-339 fixes itemscope hashcode collision problem, allows absolute URIs
as subjects
----
> Microdata extractor can sometime merge two different itemscopes into one
> ------------------------------------------------------------------------
>
> Key: ANY23-339
> URL: https://issues.apache.org/jira/browse/ANY23-339
> Project: Apache Any23
> Issue Type: Bug
> Components: extractors
> Affects Versions: 2.2
> Reporter: Hans Brende
> Priority: Major
> Fix For: 2.3
>
>
> The microdata extractor calculates the *subject* of a triple as the
> *hashCode()* of the itemscope.
> Java's hashCode() method (returning a 32-bit integer) is not guaranteed to be
> collision-free. (Especially so in this case, since the ItemScope.hashCode()
> method is not written very well).
> This means that two microdata items can accidentally be merged into one.
> Here's the line that needs to be changed:
> [https://github.com/apache/any23/blob/316b4ec0d6285a204789792084caf012c000b196/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataExtractor.java#L439]
> I recommend changing
> {code}
> subject = RDFUtils.getBNode(Integer.toString(itemScope.hashCode()));
> {code}
> to
> {code}
> subject = RDFUtils.bnode();
> {code}
> We could also use {{itemScope.getItemId()}} if it's not null, even if it's
> not a URL. An example of one such id possible is:
> {code}
> urn:isbn:0-330-34032-8
> {code}
> Edit: according to the [microdata
> spec|https://www.w3.org/TR/microdata-rdf/#dfn-absolute-url],
> {{urn:isbn:0-330-34032-8}} *is* an absolute URL. Since their definition of
> URL seems to correspond more closely to our definition of URI, we should be
> checking for absolute urls with {{URI.isAbsolute()}} rather than with
> {{URL.getProtocol() != null}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)