[jira] [Commented] (ANY23-351) NullPointerException in HCardExtractor

ASF GitHub Bot (JIRA) Wed, 27 Jun 2018 09:38:31 -0700


    [ 
https://issues.apache.org/jira/browse/ANY23-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525267#comment-16525267
 ]


ASF GitHub Bot commented on ANY23-351:
--------------------------------------

GitHub user HansBrende opened a pull request:

    https://github.com/apache/any23/pull/86

    ANY23-351 fixed NullPointerException in HCardExtractor

    This PR:
    1. fixes the NullPointerExceptions that have occurred in HCardExtractor
    2. supports the 'srcset' attribute in obtaining urls from img elements
    3. fixes the fallback url extraction method to align with the spec by only 
obtaining url text from 'value'-class elements, or, if none defined, from 
non-'type'-class elements.
    
    mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HansBrende/any23 ANY23-351

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/86.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #86
    
----
commit 31e1142d1c43ca06065d6d48dd929f16a60f7c12
Author: Hans <firedrake93@...>
Date:   2018-06-27T16:26:50Z

    ANY23-351 fixed NullPointerException in HCardExtractor

----


> NullPointerException in HCardExtractor
> --------------------------------------
>
>                 Key: ANY23-351
>                 URL: https://issues.apache.org/jira/browse/ANY23-351
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: microformats
>    Affects Versions: 2.3
>            Reporter: Hans Brende
>            Priority: Major
>
> When extracting from the url: 
> https://cambridgewi.com/make-cambridge-home/char/V/
> I get the following NullPointerException, which kills the entire extraction 
> process:
> {code}
> java.lang.NullPointerException
>       at 
> org.apache.any23.extractor.html.HTMLDocument.readUrlField(HTMLDocument.java:119)
>       at 
> org.apache.any23.extractor.html.HTMLDocument.getPluralUrlField(HTMLDocument.java:288)
>       at 
> org.apache.any23.extractor.html.HCardExtractor.addLogo(HCardExtractor.java:267)
>       at 
> org.apache.any23.extractor.html.HCardExtractor.extractEntity(HCardExtractor.java:130)
>       at 
> org.apache.any23.extractor.html.EntityBasedMicroformatExtractor.extract(EntityBasedMicroformatExtractor.java:66)
>       at 
> org.apache.any23.extractor.html.MicroformatExtractor.run(MicroformatExtractor.java:102)
>       at 
> org.apache.any23.extractor.html.MicroformatExtractor.run(MicroformatExtractor.java:44)
>       at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:480)
>       at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:259)
>       at org.apache.any23.Any23.extract(Any23.java:302)
>       at org.apache.any23.Any23.extract(Any23.java:437)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ANY23-351) NullPointerException in HCardExtractor

Reply via email to