[jira] [Updated] (HTTPCLIENT-1590) Chatset detection problem if Content-Type header is text/html

Tarik Yilmaz (JIRA) Tue, 23 Dec 2014 07:15:30 -0800

     [ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tarik Yilmaz updated HTTPCLIENT-1590:
-------------------------------------
    Description: 
{code}
HttpClient client = HttpClients.createDefault();
HttpEntity entity = client.execute(new HttpGet(url)).getEntity();
String charset = ContentType.get(entity).getCharset().displayName();
{code}

the third line throws an NullPointerException.

Response headers :
{code}
Cache-Control:private
Content-Encoding:gzip
Content-Length:16636
Content-Type:text/html
Date:Tue, 23 Dec 2014 14:06:13 GMT
Server:Microsoft-IIS/7.0
X-Powered-By:ASP.NET
{code}

Response meta tag :
{code}
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>

<html xmlns:fb="http://ogp.me/ns/fb#";>
<HEAD>



<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-9" />
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254" />
<link rel="SHORTCUT ICON" href="/favicon.ico" />
{code}

How can I receive real charset from DOM object. I am using Jsoup for parse 
document with Jsoup.parse(InputStream, String, String) method.

  was:
{code}
HttpClient client = HttpClients.createDefault();
HttpEntity entity = client.execute(new HttpGet(url)).getEntity();
String charset = ContentType.get(entity).getCharset().displayName();
{code}

third line throw an NullPointerException.

Response headers :
{code}
Cache-Control:private
Content-Encoding:gzip
Content-Length:16636
Content-Type:text/html
Date:Tue, 23 Dec 2014 14:06:13 GMT
Server:Microsoft-IIS/7.0
X-Powered-By:ASP.NET
{code}

Response meta tag :
{code}
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>

<html xmlns:fb="http://ogp.me/ns/fb#";>
<HEAD>



<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-9" />
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254" />
<link rel="SHORTCUT ICON" href="/favicon.ico" />
{code}

How can I receive real charset from DOM object. I am using Jsoup for parse 
document with Jsoup.parse(InputStream, String, String) method.


> Chatset detection problem if Content-Type header is text/html
> -------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1590
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1590
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>    Affects Versions: 4.3.6
>            Reporter: Tarik Yilmaz
>            Priority: Critical
>
> {code}
> HttpClient client = HttpClients.createDefault();
> HttpEntity entity = client.execute(new HttpGet(url)).getEntity();
> String charset = ContentType.get(entity).getCharset().displayName();
> {code}
> the third line throws an NullPointerException.
> Response headers :
> {code}
> Cache-Control:private
> Content-Encoding:gzip
> Content-Length:16636
> Content-Type:text/html
> Date:Tue, 23 Dec 2014 14:06:13 GMT
> Server:Microsoft-IIS/7.0
> X-Powered-By:ASP.NET
> {code}
> Response meta tag :
> {code}
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
> "http://www.w3.org/TR/html4/loose.dtd";>
> <html xmlns:fb="http://ogp.me/ns/fb#";>
> <HEAD>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-9" />
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254" />
> <link rel="SHORTCUT ICON" href="/favicon.ico" />
> {code}
> How can I receive real charset from DOM object. I am using Jsoup for parse 
> document with Jsoup.parse(InputStream, String, String) method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HTTPCLIENT-1590) Chatset detection problem if Content-Type header is text/html

Reply via email to