[ 
https://issues.apache.org/jira/browse/ANY23-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481311#comment-16481311
 ] 

ASF GitHub Bot commented on ANY23-348:
--------------------------------------

GitHub user HansBrende opened a pull request:

    https://github.com/apache/any23/pull/85

    ANY23-348 handle malformed microdata types gracefully

    I did two things: 
    
    (1) Treat blank microdata types as if they were null
    
    (2) For other varieties of malformed microdata types, first attempt to fix 
them by trimming leading & trailing whitespaces and url-encoding illegal 
characters. If that fails, then only throw a fatal error if the microdata 
parser error mode is set to STOP_AT_FIRST_ERROR; otherwise, add the error to 
the error list, treat the type as if it were null, and continue parsing.
    
    mvn clean test -> all tests pass
    
    @lewismc what do you think?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HansBrende/any23 ANY23-348

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/85.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #85
    
----
commit 61d91b55053bb8e3b087216b082594fa1db85a85
Author: Hans <firedrake93@...>
Date:   2018-05-18T22:38:28Z

    ANY23-348 handle malformed microdata types gracefully

----


> IllegalArgumentException in MicrodataExtractor
> ----------------------------------------------
>
>                 Key: ANY23-348
>                 URL: https://issues.apache.org/jira/browse/ANY23-348
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: microdata
>    Affects Versions: 2.3
>            Reporter: Hans Brende
>            Assignee: Hans Brende
>            Priority: Major
>
> I get the following IllegalArgumentException when extracting from 
> http://movies.eventful.com/theaters-showtimes/canyon-meadows-/T0-001-000005891-8
> I also get it when extracting from: http://eventful.com/performers
> This IllegalArgumentException kills the whole extraction process.
> Haven't had time to debug this.
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid type 
> '', must be a valid URL.
>       at 
> org.apache.any23.extractor.microdata.ItemScope.<init>(ItemScope.java:81)
>       at 
> org.apache.any23.extractor.microdata.MicrodataParser.getItemScope(MicrodataParser.java:509)
>       at 
> org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:196)
>       at 
> org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:213)
>       at 
> org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:89)
>       at 
> org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:60)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to