2009/9/14 Shachar Shemesh <shac...@shemesh.biz> > Hi all, > > One of my clients is having a weird problem, and I'm pretty much at my > wit's end as for what to do about it. > > The site is called "Tzofit" (at tzofit.co.il), and is an index and > publisher for Zimmers. When you search Google for "צימרים" the site appears > on the second page, and when you search Google for "צופית" it is the first > result. In both cases, you cannot miss it - Google displays the site's title > and summary as Japanese! > > Now here's where it gets really strange. While the main site is proclaimed > to be in Japanese, all the deep links are in Hebrew. If you ask to see the > Google cache, the site appears in Hebrew. If you search for its address > directly (tzofit.co.il), the site appears with correct title and summary. > The only explanation I have is that this is a Google index bug. > > The problem is that even if that is the case, I cannot see what I can do > about it. I tried to ask about it on the Google forums ( > http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1ab&hl=en), > but, as expected, got not replies. On the other hand, I did not manage to > find anything wrong with the actual page. > > Trying to translate the Japanese text, using Google Translate, back to > English seems to show that the text translates, but is not coherent > sentences. Then again, looking at the raw encoding, this does not appear to > be Hebrew interpreted with the wrong encoding (or am I missing something?) > > If anyone has any clue, it would be much appreciated. > > I would try the following:
- remove extra newlines from beginning of document. an xml document should begin with an xml definition. maybe newlines are valid, i never checked, but usually they don't begin that way, so why do it... :) - in an html document, you define the language inside the html opening tag, with lang="he". the meta tag that does this is redundant, and I would assume google likes the html definition better. - the newlines in the file appears to be dos-style. maybe you want to try to run the file through dos2unix - it could be this windows-1255 thing - maybe try putting there iso-8859-8-i - or even better, switch to utf-8 altogether. "everybody loves utf-8" :) These are my ideas... HTH, -- Shimi
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il