my $html = decode ('CP-1256', $res->content);
this function does not help, problem still there.> Date: Wed, 6 Aug 2008
08:24:20 -0700> From: [EMAIL PROTECTED]> Subject: Re: FW: [Moses-support]
web-based coding problem?> To: [EMAIL PROTECTED]> CC: [email protected]> >
Hi all,> > decode_entities has nothing to do with character encodings. It
replaces HTML entities such as or é with the character they stand
for. Decoding is a separate process.> > > The part that does the decoding is
line 220:> > my $html = $res->decoded_content;> > The decoded_content method
(from the HTTP::Message class) uses the character set declared in the HTTP
response or in the HTML file itself to convert bytes to characters. If neither
are present, I think it will assume ISO-8859-1 as a default.> > I think
translate.cgi as it is works with any encoding, as long as they are declared
somewhere, i.e., it does not do character set detection.> > > Now if you know
that the system will always be used for pages in a certain encoding, you could
override this decoding by doing it yourself, e.g., by replacing line 220 with
this:> > my $html = decode ('CP-1256', $res->content);> > But obviously this
only works if every page you serve is CP-1256, or if you have any other means
of recognizing it, which is probably not the case.> > > Otherwise you'll have
to look into character set detection. As a start you could look into the
CharsetDetector package, I've never used it myself but it looks promising:> >
http://search.cpan.org/perldoc?CharsetDetector> > > Good luck,> Herve> > > >
-----Original Message-----> > From: [EMAIL PROTECTED]> > [mailto:[EMAIL
PROTECTED]> > On Behalf Of Philipp Koehn> > Sent: 06 August 2008 15:20> > To:
musa ghurab> > Cc: [email protected]> > Subject: Re: [Moses-support]
web-based coding problem?> > > > Hi,> > > > you probably have to extend the
code yourself to (a) detect> > the HTML page's> > encoding and (b) convert it
into UTF8 (which should be very> > straight-forward> > in Perl).> > > > -phi> >
> > On Sat, Aug 2, 2008 at 5:48 PM, musa ghurab> > <[EMAIL PROTECTED]> wrote:>
> > Hi all> > >> > > I'm facing problem with the moses web-based,> > problem
related to encoding.> > > In web-root file: translate.cgi line: 234> > >> > >
$html=decode_entities($html);> > >> > > decode_entities(page coding:
windows-1256)?wrong> > coding (not utf8)> > >> > > This is converting the
fetched text from iso coding to> > utf8 coding. > > > But what I got is when
fetch page other than utf8 such> > as Arabic > > > (windows-1256> > >
(cp-1256)) or any page not declaring the coding in the> > charset of head > > >
tag of html, then it goes to wrong encoding and moses> > cannot > > >
understand this coding.> > > i think this is bug with perl or must use another>
> function for this.> > >> > > Please any suggestion to solve this problem.> >
>> > >> > >> > > musa ghurab>
_________________________________________________________________
Connect to the next generation of MSN Messenger
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support