[
https://issues.apache.org/jira/browse/ANY23-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated ANY23-457:
---------------------------------------
Description:
This problem is encountered when we attempt to parse the following HTML
https://www-robotics.jpl.nasa.gov/links/index.cfm
https://www-robotics.jpl.nasa.gov/patents/index.cfm
ERROR rdf.BaseRDFExtractor - Error while parsing RDF document.
White spaces are required between publicId and systemId
If one looks at the HTML source you will see the following
{code:html}
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
{code}
Reading [this article|https://stackoverflow.com/a/9225499], it looks like we
may be able to create a rule and 'fix' which would create the following
{code:html}
<!-- Notice the addition of "" -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<html>
<head>
{code}
was:White spaces are required between publicId and systemId
> Fix error: White spaces are required between publicId and systemId
> ------------------------------------------------------------------
>
> Key: ANY23-457
> URL: https://issues.apache.org/jira/browse/ANY23-457
> Project: Apache Any23
> Issue Type: Bug
> Components: fix, rule
> Affects Versions: 2.4
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 2.5
>
>
> This problem is encountered when we attempt to parse the following HTML
> https://www-robotics.jpl.nasa.gov/links/index.cfm
> https://www-robotics.jpl.nasa.gov/patents/index.cfm
> ERROR rdf.BaseRDFExtractor - Error while parsing RDF document.
> White spaces are required between publicId and systemId
> If one looks at the HTML source you will see the following
> {code:html}
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
> {code}
> Reading [this article|https://stackoverflow.com/a/9225499], it looks like we
> may be able to create a rule and 'fix' which would create the following
> {code:html}
> <!-- Notice the addition of "" -->
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
> <html>
> <head>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)