On Thursday 23 May 2013 08:01:51 Matthew Mondor wrote: > Unfortunately, path/file names encoding are OS-specific, file-system > specific and may be locale specific... > > POSIX filenames may contain bytes which are often used to hold UTF-8 > characters on filesystems which allow this, but that too is only one of > the available encoding options, and unfortunately filenames cannot be > tagged with the encoding type, except if using an uncommon convention > like is used in RFC 2047 for message headers, or non-portable > attributes/subfiles, so files named by others on their systems may not > display correctly locally on the same OS and FS). However, because > POSIX syscalls expect C strings, UTF-8 is popular when the various > single-byte encodings are not used. > > My Windows experience is limited, but I think that it usually uses > UTF-16 where unicode strings are possible. > > ECL internally stores unicode strings using UCS-32, and the base-string > only accepts character codes 0-255. > > > This might not be the only or cleanest solution, but this might work to > create UTF-8 pathnames for POSIX systems: > > > (defun utf-8-base-string<-string (string) > "Encodes the supplied STRING to an UTF-8 base-string which it returns." > (let ((v (make-array (+ 5 (length string)) ; Best case but we might grow > > :element-type 'base-char > :adjustable t > :fill-pointer 0))) > > (with-open-stream (s (ext:make-sequence-output-stream > v :external-format :utf-8)) > (loop > for c across string > do > (write-char c s) > (let ((d (array-dimension v 0))) > (when (< (- d (fill-pointer v)) 5) > (adjust-array v (* 2 d)))))) > v)) > > ; (pathname (utf-8-base-string<-string "тест")) -> #P"Ñ\202еÑ\201Ñ\202" > > > If you need more portable encoding conversion code, the Babel CL > library also supports such (http://common-lisp.net/project/babel/).
Thank you for solution Matt. I understand OS and locale specifics, but this solution seems an ugly low-level hack for cross-platform high-level language. Am I wrong? Information about OS is available in compilation phase, about locale - in runtime. Now I have installed ecl and clozurecl. And both have problems with non-ASCII filenames. ECL throws error while coerce string to base-string, Clozurecl writes data to file with name in wrong encoding. I never use cyrilic filenames before, but my clients use it. And this problem is a surprise for us :) ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may _______________________________________________ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list