Hi, you probably have to extend the code yourself to (a) detect the HTML page's encoding and (b) convert it into UTF8 (which should be very straight-forward in Perl).
-phi On Sat, Aug 2, 2008 at 5:48 PM, musa ghurab <[EMAIL PROTECTED]> wrote: > Hi all > > I'm facing problem with the moses web-based, problem related to encoding. > In web-root file: translate.cgi line: 234 > > $html=decode_entities($html); > > decode_entities(page coding: windows-1256)wrong coding (not utf8) > > This is converting the fetched text from iso coding to utf8 coding. But what > I got is when fetch page other than utf8 such as Arabic (windows-1256 > (cp-1256)) or any page not declaring the coding in the charset of head tag > of html, then it goes to wrong encoding and moses cannot understand this > coding. > i think this is bug with perl or must use another function for this. > > Please any suggestion to solve this problem. > > > > musa ghurab > > > ________________________________ > Get news, entertainment and everything you care about at Live.com. Check it > out! > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
