On 6/6/12 9:21 PM, "ext Oswald Buddenhagen" <oswald.buddenha...@nokia.com> wrote:
>On Wed, Jun 06, 2012 at 04:51:14PM +0200, ext João Abecasis wrote: >> Thiago Macieira wrote: >> > So you're asking that filenames be passed on the locale encoding >>(say, UTF-8) >> > on the command-line, regardless of what the filesystem encoding is? >> >> I see no other sane way, unless your application is able to take the >>byte sequences it gets without additional processing. >> >thiago's whole point is that most command line apps just assume that >they can do that. >but you know what? it doesn't matter. apps which assume 8-bit >pass-through are simply not suited for fs-encoding != locale, because >the user will have all kinds of problems with that anyway (starting with >the command line if 8-bit passthrough is simply impossible, as it is >with a terminal in utf-8 mode). no valid use case. period. > >> > In fact, there is one more possible solution which stands a chance: >>forcing >> > the problem onto the kernel. Make the entire userspace API be UTF-8 >>and have >> > the kernel recode to the filesystem encoding as necessary. The >>problem with >> > this solution is that it a) will suffer extreme resistance from >>kernel >> > developers and other people who think of file names as "binary data" >>instead of >> > human-readable text; and b) is no different from the other solution >>of >> > enforcing the encoding. >> >> Forcing this onto the Linux kernel would in the long term make the >> situation better for Linux users that don't receive files from any >> other OSs or kernel versions. It doesn't help everyone. >> >every fs used by windows in the last 1.5 decades (vfat, ntfs, isofs with >joliet, udf) is utf-16 based. so anything you get out of the kernel is >inherently recoded already, and the fs drivers have respective mount >options (though there is no standardization of any kind). i.e., the >problem simply does not exist for usb sticks and similar, provided udev >& co. correctly feed the kernel with the locale when mounting media. >some of the 8-bit fs drivers also have recoding options, but notably >they are missing from the linux-native fses. it shouldn't be too hard to >add some generic 8-bit-to-8-bit recoding option, but i fear nobody may >care at this point. > >for the places where the problem does exists for whatever reason, the >fs-encoding is therefore mountpoint-specific (where the "mountpoint" can >also exist in user space when we are talking about a virtual file >system, like an archive). have fun solving *that* inside qt ... > >> What is a real problem in practice and the one that (in my mind) >> setEncodingFunction addresses is not so much that of switching >> encodings, but that of allowing an escaping mechanism to be plugged >> in. Done this way, such escaping would be not only Qt-specific, but >> potentially application specific. Still, it should enable a simple >> File Manager built on Qt to operate on all files it sees. >> >yes, this is what follows from the above. > > >On Wed, Jun 06, 2012 at 05:36:30PM +0200, ext Thiago Macieira wrote: >> On quarta-feira, 6 de junho de 2012 16.51.14, João Abecasis wrote: >> > We could use some magic sequence. Windows, for instance, uses the >>"\\?\" >> > prefix to support longer paths. We could use '<' and '>', which are >>rare >> > but valid, we could give a specific meaning to sequences of 3 or more >> > slashes. >> > >> > I don't have a concrete solution at the moment. >> >> I really think we should not use a character that is easily used on >>file names, >> and that includes <, >, commas, percents, backslashes, spaces, etc. It >>needs >> to be a Unicode character that has a close-to-zero chance of being >> intentionally used. >> >the problem is that this has a lower chance of surviving various >round-trips - something 7-bit-clean would be better. >that of course means using a trigger sequence which has almost-zero >chance to occur otherwise, say "@--" (no special chars of any shell, no >path separators of any os). to keep the thing halfways readable, do the >escaping segment wise, and use url-encoding for the escaped segments. > >> Anyway, what I recommend for now: >> >> 1) immediately, de-inline QFile::decodeName and QFile::encodeName >> 2) un-deprecate them and update the text in changes-5.0.0 >> >well, why not. Ok, but only for the specific use case of converting arbitrary 8bit (including invalid sequences in the locale) to a QString and back. No way to set any decoder functions. What do we do with this on Windows? I am almost tempted to not even offer the methods there, as everything's utf16 anyway, so the problem doesn't exist. >> 3) make QProcess use QFile::encodeName for its arguments (no-op right >>now) >> 4) make QCoreApplication parse its arguments using QFile::decodeName >>(no-op >> right now) On Mac and Unix. On Windows we need a different solution. There we actually get the arguments in utf16 on windows (in WinMain()). Currently we still use toLocale8Bit() in there, and that can/will probably break badly as local8Bit() on windows is usually not utf-8. So qtmain_win.cpp needs some fixing anyway. >> 5) idem for Laszlo's command-line parser class >> >no. see first paragraph. doing this would only increase the mess. > >> Later, we can decide whether to add escaping to those functions. >> >> However, I cannot agree with bringing the setter functions back. I do >>agree >> with removing them completely, though. >> >ack Yes. No setters for the encode/decode functions. Either we handle this properly in Qt (giving full roundtrip conversions for arbitrary 8bit sequences), or not at all. Cheers, Lars _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development