i'm tring to translate first page of www.aljazeera.net (windows-1256 but other
links from main page are utf8)
or
any link related to the main page of www.moheet.com (windows-1256 but main page
is utf8)
> Date: Wed, 6 Aug 2008 09:36:12 -0700> From: [EMAIL PROTECTED]> Subject: Re:
> [Moses-support] web-based coding problem?> To: [EMAIL PROTECTED]> > Hi,> >
> what page are you trying to translate?> > Herve> > > --- On Wed, 6/8/08, musa
> ghurab <[EMAIL PROTECTED]> wrote:> > > From: musa ghurab <[EMAIL PROTECTED]>>
> > Subject: Re: [Moses-support] web-based coding problem?> > To:
> [email protected]> > Date: Wednesday, 6 August, 2008, 6:33 PM> > > > my
> $html = decode ('CP-1256', $res->content);> > this function does not help,
> problem still there.> Date:> > Wed, 6 Aug 2008 08:24:20 -0700> From:> >
> [EMAIL PROTECTED]> Subject: Re: FW: [Moses-support]> > web-based coding
> problem?> To:> > [EMAIL PROTECTED]> CC: [email protected]>> > > Hi all,>
> > decode_entities has nothing to do with> > character encodings. It replaces
> HTML entities such as> > or é with the character they stand> >
> for. Decoding is a separate process.> > > The part> > that does t
he decoding is line 220:> > my $html => > $res->decoded_content;> > The
decoded_content> > method (from the HTTP::Message class) uses the character
set> > declared in the HTTP response or in the HTML file itself to> > convert
bytes to characters. If neither are present, I think> > it will assume
ISO-8859-1 as a default.> > I think> > translate.cgi as it is works with any
encoding, as long as> > they are declared somewhere, i.e., it does not do
character> > set detection.> > > Now if you know that the system> > will always
be used for pages in a certain encoding, you> > could override this decoding by
doing it yourself, e.g., by> > replacing line 220 with this:> > my $html =
decode> > ('CP-1256', $res->content);> > But> > obviously this only works if
every page you serve is> > CP-1256, or if you have any other means of
recognizing it,> > which is probably not the case.> > > Otherwise> > you'll
have to look into character set detection. As a> > start you could look into
the Ch
arsetDetector package,> > I've never used it myself but it looks promising:>>
> > http://search.cpan.org/perldoc?CharsetDetector> >> > > Good luck,> Herve> >
> > -----Original> > Message-----> > From:> > [EMAIL PROTECTED]> >> >
[mailto:[EMAIL PROTECTED]> > On Behalf Of> > Philipp Koehn> > Sent: 06 August
2008 15:20> >> > To: musa ghurab> > Cc: [email protected]> >> > Subject:
Re: [Moses-support] web-based coding problem?>> > > > > Hi,> > > > you probably
have to> > extend the code yourself to (a) detect> > the HTML> > page's> >
encoding and (b) convert it into UTF8> > (which should be very> >
straight-forward> > in> > Perl).> > > > -phi> > > > On Sat,> > Aug 2, 2008 at
5:48 PM, musa ghurab> >> > <[EMAIL PROTECTED]> wrote:> > > Hi> > all> > >> > >
I'm facing problem with> > the moses web-based,> > problem related to> >
encoding.> > > In web-root file: translate.cgi> > line: 234> > >> > >> >
$html=decode_entities($html);> > >> > >> > decode_e
ntities(page coding: windows-1256)?wrong> >> > coding (not utf8)> > >> > >
This is> > converting the fetched text from iso coding to> > utf8> > coding. >
> > But what I got is when fetch page> > other than utf8 such> > as Arabic > >
>> > (windows-1256> > > (cp-1256)) or any page not> > declaring the coding in
the> > charset of head >> > > > tag of html, then it goes to wrong encoding
and> > moses> > cannot > > > understand this> > coding.> > > i think this is
bug with perl or must> > use another> > function for this.> > >>> > > > Please
any suggestion to solve this problem.>> > > >> > >> > >> > > musa> > ghurab> >
> _________________________________________________________________> > Connect
to the next generation of MSN Messenger > >
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline_______________________________________________>
> Moses-support mailing list> > [email protected]> >
http://mailman.mit.edu/mailman/listinfo/moses-supp
ort
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support