Hi, This is because the resource at www.annahar.com that you link to returns a Content-Type header that just reads "text/html":
$ curl -v "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon" >/dev/null * About to connect() to www.annahar.com port 80 (#0) * Trying 66.242.155.235... connected * Connected to www.annahar.com (66.242.155.235) port 80 (#0) > GET /content.php?priority=1&table=main&type=main&day=Mon HTTP/1.1 > User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l > zlib/1.2.3 > Host: www.annahar.com > Accept: */* > < HTTP/1.1 200 OK < Connection: close < Date: Tue, 16 Aug 2011 11:50:50 GMT < Server: Microsoft-IIS/6.0 < X-Powered-By: ASP.NET < X-Powered-By: PHP/5.2.0 < Content-type: text/html < % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{ [data not shown] 100 91340 0 91340 0 0 187k 0 --:--:-- --:--:-- --:--:-- 237k* Closing connection #0 So httpclient is doing the right thing -- it's giving you access to exactly what's in the header that's returned. Jon On Tue, Aug 16, 2011 at 7:42 AM, Khosro Asgharifard Sharabiani <[email protected]> wrote: > Hello, > I use the following code to find charset of a page,but it does not worked for > page > "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon" > > Code : > [code] > > try { > HttpClient httpclient = new DefaultHttpClient(); > String > url="http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon"; > HttpGet httpget = new HttpGet(url); > HttpResponse response; > response = httpclient.execute(httpget); > HttpEntity entity = response.getEntity(); > if (entity != null) { > Header[] allHeaders = response.getHeaders("Content-Type"); > System.out.println(allHeaders[0].getValue()); > } > } catch (ClientProtocolException e) { > e.printStackTrace(); > } catch (IOException e) { > e.printStackTrace(); > } > [/code] > > > And the output of above code is : text/html. > But i think the output must be "text/html; charset=windows-1256" .Am i right? > > But when i use > "http://bigbrowser.blog.lemonde.fr/2011/08/03/iran-le-mossad-derriere-le-meurtre-dun-scientifique-spiegel" > as a url in code,it returns "text/html; charset=UTF-8" ,that i think ,it is > OK. > It seems ,it works for some pages not all of them.Why this happens? > > > Khosro. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
