[ 
https://issues.apache.org/jira/browse/ANY23-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated ANY23-457:
---------------------------------------
    Description: 
This problem is encountered when we attempt to parse the following HTML

https://www-robotics.jpl.nasa.gov/links/index.cfm
https://www-robotics.jpl.nasa.gov/patents/index.cfm

ERROR rdf.BaseRDFExtractor - Error while parsing RDF document.
White spaces are required between publicId and systemId

If one looks at the HTML source you will see the following

{code:html}
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
{code}

Reading [this article|https://stackoverflow.com/a/9225499], it looks like we 
may be able to create a rule and 'fix' which would create the following

{code:html}
<!-- Notice the addition of "" -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<html>
<head>
{code}


  was:White spaces are required between publicId and systemId


> Fix error: White spaces are required between publicId and systemId
> ------------------------------------------------------------------
>
>                 Key: ANY23-457
>                 URL: https://issues.apache.org/jira/browse/ANY23-457
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: fix, rule
>    Affects Versions: 2.4
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 2.5
>
>
> This problem is encountered when we attempt to parse the following HTML
> https://www-robotics.jpl.nasa.gov/links/index.cfm
> https://www-robotics.jpl.nasa.gov/patents/index.cfm
> ERROR rdf.BaseRDFExtractor - Error while parsing RDF document.
> White spaces are required between publicId and systemId
> If one looks at the HTML source you will see the following
> {code:html}
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
> {code}
> Reading [this article|https://stackoverflow.com/a/9225499], it looks like we 
> may be able to create a rule and 'fix' which would create the following
> {code:html}
> <!-- Notice the addition of "" -->
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
> <html>
> <head>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to