Hi Khosri, The Content-Type header is set (correctly) to "text/html", like Jon said. There's no header in the response that says anything about the character set, but you can obtain this information from the entity itself: the HTML contains the character set inside the meta tag: <meta http-equiv="Content-Type" content="text/html; charset=windows-1256">
See also http://www.w3.org/International/O-charset to get more information about all different possibilities to declare the character encodings. Kind regards, Stijn Deknudt. On 8/16/11, Jon Moore <[email protected]> wrote: > Hi, > > This is because the resource at www.annahar.com that you link to > returns a Content-Type header that just reads "text/html": > > $ curl -v > "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon" >>/dev/null > * About to connect() to www.annahar.com port 80 (#0) > * Trying 66.242.155.235... connected > * Connected to www.annahar.com (66.242.155.235) port 80 (#0) >> GET /content.php?priority=1&table=main&type=main&day=Mon HTTP/1.1 >> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 >> OpenSSL/0.9.7l zlib/1.2.3 >> Host: www.annahar.com >> Accept: */* >> > < HTTP/1.1 200 OK > < Connection: close > < Date: Tue, 16 Aug 2011 11:50:50 GMT > < Server: Microsoft-IIS/6.0 > < X-Powered-By: ASP.NET > < X-Powered-By: PHP/5.2.0 > < Content-type: text/html > < > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- > --:--:-- 0{ [data not shown] > 100 91340 0 91340 0 0 187k 0 --:--:-- --:--:-- > --:--:-- 237k* Closing connection #0 > > So httpclient is doing the right thing -- it's giving you access to > exactly what's in the header that's returned. > > Jon > > > On Tue, Aug 16, 2011 at 7:42 AM, Khosro Asgharifard Sharabiani > <[email protected]> wrote: >> Hello, >> I use the following code to find charset of a page,but it does not worked >> for page >> "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon" >> >> Code : >> [code] >> >> try { >> HttpClient httpclient = new DefaultHttpClient(); >> String >> url="http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon"; >> HttpGet httpget = new HttpGet(url); >> HttpResponse response; >> response = httpclient.execute(httpget); >> HttpEntity entity = response.getEntity(); >> if (entity != null) { >> Header[] allHeaders = response.getHeaders("Content-Type"); >> System.out.println(allHeaders[0].getValue()); >> } >> } catch (ClientProtocolException e) { >> e.printStackTrace(); >> } catch (IOException e) { >> e.printStackTrace(); >> } >> [/code] >> >> >> And the output of above code is : text/html. >> But i think the output must be "text/html; charset=windows-1256" .Am i >> right? >> >> But when i use >> "http://bigbrowser.blog.lemonde.fr/2011/08/03/iran-le-mossad-derriere-le-meurtre-dun-scientifique-spiegel" >> as a url in code,it returns "text/html; charset=UTF-8" ,that i think ,it >> is OK. >> It seems ,it works for some pages not all of them.Why this happens? >> >> >> Khosro. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Stijn [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
