> As you will refer to the encoding rule for UTF-8, > '?' and '*' must not appear in the #2, #3 and surrogate byte.
Good to know, thanks for the info, that makes the issue less bad. > As for filename globbing operation, String.getBytes("UTF-8") > will be enought. Why are you against String.toCharArray() ? Do you fear about performance when dealing with 16-bit chars? Personally, I do not think that on modern machines the 16-bit operations are any slower than 8-bit. Some may even be slower when iterating over a packed byte array. And, getting the 16-bit char array out of the String is a very simple operation. Actually, I'd propose doing both isPattern() and unquote() in a single method call, encaptured in a local static class - then a single iteration over the String is sufficient: private static class Unquoter { boolean isPattern = false; boolean wasQuoted = false; StringBuffer buf; public Unquoter(String s) { buf = new StringBuffer(s.length()); for(int i=0; i<s.length(); i++) { } } public boolean isPattern() { return isPattern; } public boolean wasQuoted() { return wasQuoted; } public String getUnQuoted() { return buf.toString(); } } Cheers, -- Martin Oberhuber Wind River Systems, Inc. Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm > -----Original Message----- > From: Atsuhiko Yamanaka [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 27, 2007 4:47 AM > To: Oberhuber, Martin > Cc: jsch-users@lists.sourceforge.net > Subject: Re: [JSch-users] Jsch bug: ChannelSftp cannot read > contents ofdirectory with '?' in its name > > Hi, > > +-From: "Oberhuber, Martin" <[EMAIL PROTECTED]> -- > |_Date: Wed, 26 Sep 2007 18:02:08 +0200 _______________________ > | > |Given a remote UNIX directory with an '?' or '*' in its name, > |for instance /parent/a?b > |Even if the path name is quoted ("/parent/a\\?b"), calling > |ChannelSftp.ls(quotedDir) does not return the contents of > |dir, but it returns the list of file which reside in > |parent/ and match the pattern a?b. > |The reason for this problem is that > | ChannelSftp.isPattern() > |just checks for occurrence of the characters '?' and '*', > |but does not check whether these characters have been quoted > |with a \ or not. > > You are right. There are > ChannelSftp#isPattern(String path) and > ChannelSftp#isPattern(byte[] path) > ,but the former hamethod s not cheked carefully. It should be fixed. > > |A slightly related problem to this is that unquoting of > |path names is done in the encoded byte[] array rather > |than the Unicode String. Given remote path names that > |are encoded in UTF-8 or similar, there can be characters > |encoded as 2 or three bytes. It's possible that byte > |#2 or byte #3 happen to look like a '?' or '*' character, > |but are actually not. > > As you will refer to the encoding rule for UTF-8, > '?' and '*' must not appear in the #2, #3 and surrogate byte. > On the other hand, there are such possibilties in other encoding; > for example, Shift_JIS, CP932(widely used in Japan). > > |Therefore, the unquoting must happen in the Unicode string > |rather than the encoded byte[] array -- Util.unquote() > |must use String.toCharArray() and work on that one, > |rather than String.getBytes(). > > As for filename globbing operation, String.getBytes("UTF-8") > will be enought. > > Thank you for your feedback. > > Sincerely, > -- > Atsuhiko Yamanaka > JCraft,Inc. > 1-14-20 HONCHO AOBA-KU, > SENDAI, MIYAGI 980-0014 Japan. > Tel +81-22-723-2150 > +1-415-578-3454 > Fax +81-22-224-8773 > Skype callto://jcraft/ > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ JSch-users mailing list JSch-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jsch-users