Myrna van Lunteren wrote:
On Thu, Aug 6, 2009 at 3:24 AM, Kristian Waagan<[email protected]> wrote:
Hello,

I noticed that the Japanese characters on the manuals page of the ASF Derby
site got garbled again in the last commit. I went in and backed out the
changes made to manuals/index.html. They should be visible shortly.

I think we have talked about this issue before, and couldn't really
determine if the problem was with the platform used to build the site or
with an environment setting. This is maybe something the next person to
update the site should watch out for, and we should consider adding
something in the instructions to help avoid this happening in the future.


Regards,
--
Kristian

Hm,

I think I put this in the instructions on
http://wiki.apache.org/db-derby/DerbySnapshotOrRelease after the
trouble last time, re "Update
src/documentation/content/xdocs/manuals/index.xml: Add the link to the
version's manuals (which you uploaded in the previous step)."
--------------------------
 Before checking in changes to the build/site/manuals/index.html, be
careful to check for changes to other areas than those actually
modified - especially the japanese characters; some builds may garble
this.
--------------------------

Thanks for adding this, Myrna.

Maybe Kathey did not see this or did not see anything wrong with the
index.html...

Any suggestions on how to improve that? It's probably too vague?

By configuring your terminal to use UTF-8, you can easily see the garbled characters in the diff, as they pop up as question marks.

Maybe a link to your commits, Kristian?

In this case I just backed out the last change for the manuals/index.html file:
http://svn.apache.org/viewvc/db/derby/site/trunk/build/site/manuals/index.html?view=diff&r1=801588&r2=801589&pathrev=801589
Note that the commit mail sent out doesn't seem to use the correct encoding (could be my mail reader as well).

Maybe it would be better to use Unicode encodings?
That way, every editor should be able to deal with the HTML file, and nothing should mess up the Japanese text. On the other hand, Japanese people would most likely need to look at the file in a web browser to understand what the text says...

As an example, a character would probably look like "&#x4E0E;" (hex) or "&#19982;" (decimal). We could easily convert the inserted characters by using an editor capable of showing the corresponding Unicode values for the characters (I know how to do this in vim, probably works in many others too).

Lastly, I observe one BOM (byte order mark). Since the header says this is UTF-8, it shouldn't be required, right?


--
Kristian
Myrna

Reply via email to