[
https://issues.apache.org/jira/browse/ANY23-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Kutuzov updated ANY23-165:
---------------------------------
Description:
When any23 is asked to extract semantics from a web document which is not in
UTF-8 and where TITLE precedes encoding declaration, any23 fails with error
"Invalid content '"
Example of such an URL:
http://www.kinopoisk.ru/film/565993/
Compressed dump of this page is attached.
any23 http://www.kinopoisk.ru/film/565993/
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
------------------------------------------------------------------------
Apache Any23 :: rover
------------------------------------------------------------------------
@prefix dcterms: <http://purl.org/dc/terms/> .
<http://www.kinopoisk.ru/film/565993/> dcterms:title "Ïèðàíüè 3DD" .
------------------------------------------------------------------------
Apache Any23 FAILURE
Execution terminated with errors: Invalid content ''
Total time: 1s
Finished at: Mon Jul 15 20:31:14 MSK 2013
Final Memory: 67M/479M
------------------------------------------------------------------------
was:
When any23 is asked to extract semantics from a web document which is not in
UTF-8 and where TITLE precedes encoding declaration, any23 fails with error
"Invalid content '"
Example of such an URL:
http://www.kinopoisk.ru/film/565993/
any23 http://www.kinopoisk.ru/film/565993/
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
------------------------------------------------------------------------
Apache Any23 :: rover
------------------------------------------------------------------------
@prefix dcterms: <http://purl.org/dc/terms/> .
<http://www.kinopoisk.ru/film/565993/> dcterms:title "Ïèðàíüè 3DD" .
------------------------------------------------------------------------
Apache Any23 FAILURE
Execution terminated with errors: Invalid content ''
Total time: 1s
Finished at: Mon Jul 15 20:31:14 MSK 2013
Final Memory: 67M/479M
------------------------------------------------------------------------
> "Invalid content" error if TITLE precedes encoding declaration in the document
> ------------------------------------------------------------------------------
>
> Key: ANY23-165
> URL: https://issues.apache.org/jira/browse/ANY23-165
> Project: Apache Any23
> Issue Type: Bug
> Components: encoding
> Affects Versions: 0.8.0
> Environment: Linux 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:48:43 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Andrey Kutuzov
> Labels: encoding
> Attachments: kinopoisk.html.gz
>
>
> When any23 is asked to extract semantics from a web document which is not in
> UTF-8 and where TITLE precedes encoding declaration, any23 fails with error
> "Invalid content '"
> Example of such an URL:
> http://www.kinopoisk.ru/film/565993/
> Compressed dump of this page is attached.
> any23 http://www.kinopoisk.ru/film/565993/
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> ------------------------------------------------------------------------
> Apache Any23 :: rover
> ------------------------------------------------------------------------
> @prefix dcterms: <http://purl.org/dc/terms/> .
> <http://www.kinopoisk.ru/film/565993/> dcterms:title "Ïèðàíüè 3DD" .
> ------------------------------------------------------------------------
> Apache Any23 FAILURE
> Execution terminated with errors: Invalid content ''
> Total time: 1s
> Finished at: Mon Jul 15 20:31:14 MSK 2013
> Final Memory: 67M/479M
> ------------------------------------------------------------------------
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira