i'm tring to translate first page of www.aljazeera.net  (windows-1256 but other 
links from main page are utf8)
 
or 
 
any link related to the main page of www.moheet.com (windows-1256 but main page 
is utf8)
 
> Date: Wed, 6 Aug 2008 09:36:12 -0700> From: [EMAIL PROTECTED]> Subject: Re: 
> [Moses-support] web-based coding problem?> To: [EMAIL PROTECTED]> > Hi,> > 
> what page are you trying to translate?> > Herve> > > --- On Wed, 6/8/08, musa 
> ghurab <[EMAIL PROTECTED]> wrote:> > > From: musa ghurab <[EMAIL PROTECTED]>> 
> > Subject: Re: [Moses-support] web-based coding problem?> > To: 
> [email protected]> > Date: Wednesday, 6 August, 2008, 6:33 PM> > > > my 
> $html = decode ('CP-1256', $res->content);> > this function does not help, 
> problem still there.> Date:> > Wed, 6 Aug 2008 08:24:20 -0700> From:> > 
> [EMAIL PROTECTED]> Subject: Re: FW: [Moses-support]> > web-based coding 
> problem?> To:> > [EMAIL PROTECTED]> CC: [email protected]>> > > Hi all,> 
> > decode_entities has nothing to do with> > character encodings. It replaces 
> HTML entities such as> > &nbsp; or &eacute; with the character they stand> > 
> for. Decoding is a separate process.> > > The part> > that does t
 he decoding is line 220:> > my $html => > $res->decoded_content;> > The 
decoded_content> > method (from the HTTP::Message class) uses the character 
set> > declared in the HTTP response or in the HTML file itself to> > convert 
bytes to characters. If neither are present, I think> > it will assume 
ISO-8859-1 as a default.> > I think> > translate.cgi as it is works with any 
encoding, as long as> > they are declared somewhere, i.e., it does not do 
character> > set detection.> > > Now if you know that the system> > will always 
be used for pages in a certain encoding, you> > could override this decoding by 
doing it yourself, e.g., by> > replacing line 220 with this:> > my $html = 
decode> > ('CP-1256', $res->content);> > But> > obviously this only works if 
every page you serve is> > CP-1256, or if you have any other means of 
recognizing it,> > which is probably not the case.> > > Otherwise> > you'll 
have to look into character set detection. As a> > start you could look into 
the Ch
 arsetDetector package,> > I've never used it myself but it looks promising:>> 
> > http://search.cpan.org/perldoc?CharsetDetector> >> > > Good luck,> Herve> > 
> > -----Original> > Message-----> > From:> > [EMAIL PROTECTED]> >> > 
[mailto:[EMAIL PROTECTED]> > On Behalf Of> > Philipp Koehn> > Sent: 06 August 
2008 15:20> >> > To: musa ghurab> > Cc: [email protected]> >> > Subject: 
Re: [Moses-support] web-based coding problem?>> > > > > Hi,> > > > you probably 
have to> > extend the code yourself to (a) detect> > the HTML> > page's> > 
encoding and (b) convert it into UTF8> > (which should be very> > 
straight-forward> > in> > Perl).> > > > -phi> > > > On Sat,> > Aug 2, 2008 at 
5:48 PM, musa ghurab> >> > <[EMAIL PROTECTED]> wrote:> > > Hi> > all> > >> > > 
I'm facing problem with> > the moses web-based,> > problem related to> > 
encoding.> > > In web-root file: translate.cgi> > line: 234> > >> > >> > 
$html=decode_entities($html);> > >> > >> > decode_e
 ntities(page coding: windows-1256)?wrong> >> > coding (not utf8)> > >> > > 
This is> > converting the fetched text from iso coding to> > utf8> > coding. > 
> > But what I got is when fetch page> > other than utf8 such> > as Arabic > > 
>> > (windows-1256> > > (cp-1256)) or any page not> > declaring the coding in 
the> > charset of head >> > > > tag of html, then it goes to wrong encoding 
and> > moses> > cannot > > > understand this> > coding.> > > i think this is 
bug with perl or must> > use another> > function for this.> > >>> > > > Please 
any suggestion to solve this problem.>> > > >> > >> > >> > > musa> > ghurab> > 
> _________________________________________________________________> > Connect 
to the next generation of MSN Messenger > > 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline_______________________________________________>
 > Moses-support mailing list> > [email protected]> > 
http://mailman.mit.edu/mailman/listinfo/moses-supp
 ort
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to