[
https://issues.apache.org/jira/browse/ANY23-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525267#comment-16525267
]
ASF GitHub Bot commented on ANY23-351:
--------------------------------------
GitHub user HansBrende opened a pull request:
https://github.com/apache/any23/pull/86
ANY23-351 fixed NullPointerException in HCardExtractor
This PR:
1. fixes the NullPointerExceptions that have occurred in HCardExtractor
2. supports the 'srcset' attribute in obtaining urls from img elements
3. fixes the fallback url extraction method to align with the spec by only
obtaining url text from 'value'-class elements, or, if none defined, from
non-'type'-class elements.
mvn clean test -> all tests pass
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HansBrende/any23 ANY23-351
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/86.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #86
----
commit 31e1142d1c43ca06065d6d48dd929f16a60f7c12
Author: Hans <firedrake93@...>
Date: 2018-06-27T16:26:50Z
ANY23-351 fixed NullPointerException in HCardExtractor
----
> NullPointerException in HCardExtractor
> --------------------------------------
>
> Key: ANY23-351
> URL: https://issues.apache.org/jira/browse/ANY23-351
> Project: Apache Any23
> Issue Type: Bug
> Components: microformats
> Affects Versions: 2.3
> Reporter: Hans Brende
> Priority: Major
>
> When extracting from the url:
> https://cambridgewi.com/make-cambridge-home/char/V/
> I get the following NullPointerException, which kills the entire extraction
> process:
> {code}
> java.lang.NullPointerException
> at
> org.apache.any23.extractor.html.HTMLDocument.readUrlField(HTMLDocument.java:119)
> at
> org.apache.any23.extractor.html.HTMLDocument.getPluralUrlField(HTMLDocument.java:288)
> at
> org.apache.any23.extractor.html.HCardExtractor.addLogo(HCardExtractor.java:267)
> at
> org.apache.any23.extractor.html.HCardExtractor.extractEntity(HCardExtractor.java:130)
> at
> org.apache.any23.extractor.html.EntityBasedMicroformatExtractor.extract(EntityBasedMicroformatExtractor.java:66)
> at
> org.apache.any23.extractor.html.MicroformatExtractor.run(MicroformatExtractor.java:102)
> at
> org.apache.any23.extractor.html.MicroformatExtractor.run(MicroformatExtractor.java:44)
> at
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:480)
> at
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:259)
> at org.apache.any23.Any23.extract(Any23.java:302)
> at org.apache.any23.Any23.extract(Any23.java:437)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)