[
https://issues.apache.org/jira/browse/ANY23-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481311#comment-16481311
]
ASF GitHub Bot commented on ANY23-348:
--------------------------------------
GitHub user HansBrende opened a pull request:
https://github.com/apache/any23/pull/85
ANY23-348 handle malformed microdata types gracefully
I did two things:
(1) Treat blank microdata types as if they were null
(2) For other varieties of malformed microdata types, first attempt to fix
them by trimming leading & trailing whitespaces and url-encoding illegal
characters. If that fails, then only throw a fatal error if the microdata
parser error mode is set to STOP_AT_FIRST_ERROR; otherwise, add the error to
the error list, treat the type as if it were null, and continue parsing.
mvn clean test -> all tests pass
@lewismc what do you think?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HansBrende/any23 ANY23-348
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/85.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #85
----
commit 61d91b55053bb8e3b087216b082594fa1db85a85
Author: Hans <firedrake93@...>
Date: 2018-05-18T22:38:28Z
ANY23-348 handle malformed microdata types gracefully
----
> IllegalArgumentException in MicrodataExtractor
> ----------------------------------------------
>
> Key: ANY23-348
> URL: https://issues.apache.org/jira/browse/ANY23-348
> Project: Apache Any23
> Issue Type: Bug
> Components: microdata
> Affects Versions: 2.3
> Reporter: Hans Brende
> Assignee: Hans Brende
> Priority: Major
>
> I get the following IllegalArgumentException when extracting from
> http://movies.eventful.com/theaters-showtimes/canyon-meadows-/T0-001-000005891-8
> I also get it when extracting from: http://eventful.com/performers
> This IllegalArgumentException kills the whole extraction process.
> Haven't had time to debug this.
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid type
> '', must be a valid URL.
> at
> org.apache.any23.extractor.microdata.ItemScope.<init>(ItemScope.java:81)
> at
> org.apache.any23.extractor.microdata.MicrodataParser.getItemScope(MicrodataParser.java:509)
> at
> org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:196)
> at
> org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:213)
> at
> org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:89)
> at
> org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:60)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)