my $html = decode ('CP-1256', $res->content);
this function does not help, problem still there.> Date: Wed, 6 Aug 2008 
08:24:20 -0700> From: [EMAIL PROTECTED]> Subject: Re: FW: [Moses-support] 
web-based coding problem?> To: [EMAIL PROTECTED]> CC: [email protected]> > 
Hi all,> > decode_entities has nothing to do with character encodings. It 
replaces HTML entities such as   or é with the character they stand 
for. Decoding is a separate process.> > > The part that does the decoding is 
line 220:> > my $html = $res->decoded_content;> > The decoded_content method 
(from the HTTP::Message class) uses the character set declared in the HTTP 
response or in the HTML file itself to convert bytes to characters. If neither 
are present, I think it will assume ISO-8859-1 as a default.> > I think 
translate.cgi as it is works with any encoding, as long as they are declared 
somewhere, i.e., it does not do character set detection.> > > Now if you know 
that the system will always be used for pages in a certain encoding, you could 
override this decoding by doing it yourself, e.g., by replacing line 220 with 
this:> > my $html = decode ('CP-1256', $res->content);> > But obviously this 
only works if every page you serve is CP-1256, or if you have any other means 
of recognizing it, which is probably not the case.> > > Otherwise you'll have 
to look into character set detection. As a start you could look into the 
CharsetDetector package, I've never used it myself but it looks promising:> > 
http://search.cpan.org/perldoc?CharsetDetector> > > Good luck,> Herve> > > > 
-----Original Message-----> > From: [EMAIL PROTECTED]> > [mailto:[EMAIL 
PROTECTED]> > On Behalf Of Philipp Koehn> > Sent: 06 August 2008 15:20> > To: 
musa ghurab> > Cc: [email protected]> > Subject: Re: [Moses-support] 
web-based coding problem?> > > > Hi,> > > > you probably have to extend the 
code yourself to (a) detect> > the HTML page's> > encoding and (b) convert it 
into UTF8 (which should be very> > straight-forward> > in Perl).> > > > -phi> > 
> > On Sat, Aug 2, 2008 at 5:48 PM, musa ghurab> > <[EMAIL PROTECTED]> wrote:> 
> > Hi all> > >> > > I'm facing problem with the moses web-based,> > problem 
related to encoding.> > > In web-root file: translate.cgi line: 234> > >> > > 
$html=decode_entities($html);> > >> > > decode_entities(page coding: 
windows-1256)?wrong> > coding (not utf8)> > >> > > This is converting the 
fetched text from iso coding to> > utf8 coding. > > > But what I got is when 
fetch page other than utf8 such> > as Arabic > > > (windows-1256> > > 
(cp-1256)) or any page not declaring the coding in the> > charset of head > > > 
tag of html, then it goes to wrong encoding and moses> > cannot > > > 
understand this coding.> > > i think this is bug with perl or must use another> 
> function for this.> > >> > > Please any suggestion to solve this problem.> > 
>> > >> > >> > > musa ghurab> 
_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to