Re: [JSch-users] Jsch ChannelSftp and character encodings

Atsuhiko Yamanaka Thu, 27 Sep 2007 20:05:10 -0700

Hi,

   +-From: "Oberhuber, Martin" <[EMAIL PROTECTED]> --
   |_Date: Thu, 27 Sep 2007 18:00:09 +0200 _______________________
   |
   |I thought long about your proposal to unconditionally
   |use UTF-8 encoding in Sftp. But consider the following
   |scenario:
   |  * Remote Linux box with non UTF-8 encoding (e.g. Shift_JIS)
   |  * Remote sshd which does not recode
   |  * User performs ls or similar
   |  * Shift_JIS bytestream sent from the remote is not valid UTF-8
   |Java StreamEncoder will convert invalid chars to '?'
   |Information is lost .. there is no chance to restore
   |        and know what the original files were
   |Users can not work on this system.


I understand what you are talking about very well.
But, it is not our problem and it is their problem; noncompliance to
the sftp protocol and the problem of that protocol you had mentioned.
This is my stance.

Frankly to say, I'm living on Unices for about twenty years(since 
Apollo/DOMAIN) and I have never created Japanese filename on Unices.  
It is really bad practice to use local encodings for filenames on Unices.

Of course, you can create such filenames, but you can not use them without 
pains; shell, file-utils, editors, compilers, etc. can not handle them 
without your lucky.  I have never met parsons who claims about their 
unlucky, because it must be his/her risk to use such filenames on Unices.

Thanks to UTF-8, such pains have been decreased in these days, 
and if I'm forced to use Japanese filename, I'll choose UTF-8.

   |I don't know UTF-8 very well but I assume that invalid byte
   |combinations do exist; but perhaps I'm wrong, then the problem
   |is not that bad because client can re-code the UTF-8 stream
   |to the original bytestream and than decode it with the 
   |desired encoding.
   |With the current Windows cp1252 this does not work because
   |some bytes (e.g. 0x8d) are not assigned in cp1252 so the
   |encoding is lossy and we cannot get the original byte stream.

I guess that you have misunderstood about the transcoding.

There are the concepts; the character set and encodings for it.  
For example, in Japan, we are using JIX0208 as the character set and 
usig several encodings; EUC-JP, SJIS, JIS for it.
If a character is included in JISX0208 and also include in the Unicode,
there will be the mapping method/table(EUC-JP <-> UTF-8).

As for your example, I think that '0x8d' must be appeared in #2 or #3
byte in UTF-8 encode, so it is reasonable for me that there is not 
a direct mapping for '0x8d' to cp1252. 
 
As for the off topic thing, as far as I know, Windows OS will handle the 
filenames in UTF-8 on NTFS filesystem even if you are using cp1252.

   |If you can prove that UTF-8 has a valid output for every 
   |possible byte stream I'm fine with supporting UTF-8 only
   |(though I don't find it very user friendly). If it turns
   |out that there are byte stream combinations which can
   |only be converted lossy to UTF-8 I can not accept a
   |change to Jsch which hardcoded always uses UTF-8.

I'm sorry, but I could not understand above sentence.

By the way, I have not heard the answer to my question,
  > If that method is added, are you really planning to use it 
  > in your product?
Are you planning to enable and disable that functionality
according to Session#getServerVersion() and
keep and update the database for it internally, forever?



Sincerely,
--
Atsuhiko Yamanaka
JCraft,Inc.
1-14-20 HONCHO AOBA-KU,
SENDAI, MIYAGI 980-0014 Japan.
Tel +81-22-723-2150
    +1-415-578-3454
Fax +81-22-224-8773
Skype callto://jcraft/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
JSch-users mailing list
JSch-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jsch-users

Re: [JSch-users] Jsch ChannelSftp and character encodings

Reply via email to