On Tue, 2006-09-19 at 14:54 +0200, Alexander Larsson wrote: > Thinking more closely on this it seems that neither plain paths or URIs > are really good enough. For instance, even local filenames need some > sort of escaping for us to be able to display them, since they might > contain binary data (everything but zero and slash is allowed), and we > really want to display e.g. korean smb uris with readable text instead > of crazy escape codes. > > I'll try to think some more on this. Hopefully we can come up with a > good solution.
I've looked into this a bit more. Lets start by some examples of filenames of various types: Local filenames (in utf8 mode) 1) standard: /etc/passwd 2) utf8 and spaces: "/tmp/a åäö.txt" (encoding==utf8) 3) latin-1 and spaces: "/tmp/a åäö.txt" (encoding==iso8859-1) 4) filename without encoding: "/tmp/bad:\001\010\011\012\013" (as a C string) 5) mountpoint: /mnt/cdrom (cd has title "CD Title") Ftp mount to ftp.gnome.org [where filenames are stored as utf8, this is detected by using ftp protocol extensions (there is an rfc) or by having the user specify the encoding at mount time] 6) normal dir: /pub/sources 7) valid utf8 name: /dir/a file öää.txt 8) latin-1 name: /dir/a file öää.txt Ftp mount to ftp.gnome.org (with filenames in latin-1) 9) latin-1 name: /dir/a file öää.txt Backend that stores display name separate from real name. [Examples could be a flickr backend, a file backend that handles desktop files, or a virtual location like computer:// (which is implemented using virtual desktop files atm).] 10) /tmp/foo.desktop (with Name[en]="Display Name") To complement this, here are the places where display filenames (i.e utf-8 strings) are used in the desktop: A) Absolute filename, for editing (nautilus text entry, file selector entry) B) Semi-Absolute filename, for display (nautilus window title) C) Relative file name, for display (in nautilus/file selector icon/list view) D) Relative file name, for editing (rename in nautilus) E) Relative file name, for creating absolute name (filename completion for A). This needs to know the exact form of the parent (i.e. it differs for filename vs uri). I won't list this in the table below as its always the same as A from the last slash to the end. Using the current gnome-vfs uri method, this is how the various names would look: A B C D 1) file:///etc/passwd passwd passwd passwd 2) file:///tmp/a%20%C3%B6%C3%A4%C3%A4.txt a åäö.txt a åäö.txt a åäö.txt 3) file:///tmp/a%20%E5%E4%F6.txt a ???.txt a ???.txt (invalid unicode) a ???.txt 4) file:///tmp/bad%3A%01%08%09%0A%0B bad:????? bad:????? (invalid unicode) bad:????? 5) file:///mnt/cdrom CD Title (cdrom) CD Title (cdrom) CD Title 6) ftp://ftp.gnome.org/pub/sources sources on ftp.gnome.org sources sources 7) ftp://ftp.gnome.org/dir/a%20%C3%B6%C3%A4%C3%A4.txt a åäö.txt on ftp.gnome.org a åäö.txt a åäö.txt 8) ftp://ftp.gnome.org/dir/a%20%E5%E4%F6.txt a ???.txt on ftp.gnome.org a ???.txt (invalid unicode) a ???.txt 9) ftp://ftp.gnome.org/dir/a%20%E5%E4%F6.txt a åäö.txt on ftp.gnome.org a åäö.txt a åäö.txt 10)file:///tmp/foo.desktop Display Name Display Name Display Name The stuff in column A is pretty insane. It works fine as an identifier for the computer to use, but nobody would want to have to type that in or look at that all the time. That is why Nautilus also allows entering some filenames as absolute unix pathnames, although not all filenames can be specified this way. If this is used whenever possible column A looks like this: A 1) /etc/passwd 2) /tmp/a åäö.txt 3) file:///tmp/a%20%E5%E4%F6.txt 4) file:///tmp/bad%3A%01%08%09%0A%0B 5) /mnt/cdrom 6) ftp://ftp.gnome.org/pub/sources 7) ftp://ftp.gnome.org/dir/a%20%C3%B6%C3%A4%C3%A4.txt 8) ftp://ftp.gnome.org/dir/a%20%E5%E4%F6.txt 9) ftp://ftp.gnome.org/dir/a%20%E5%E4%F6.txt 10)/tmp/foo.desktop As we see this helps for most normal local paths, but it becomes problematic when the filenames are in the wrong encoding. For non-local files it doesn't help at all. We still have to look at these horrible escapes, even when we know the encoding of the filename. The examples 7-9 in this version shows the problem with URIs. Suppose we allowed an invalid URI like "ftp://ftp.gnome.org/dir/a åäö.txt" (utf8-encoded string). Given the state inherent in the mountpoint we know what encoding is used for the ftp server, so if someone types it in we know which file they mean (either 7 or 9). However, suppose someone pastes a URI like that into firefox, or mails it to someone, now we can't reconstruct the real valid URI anymore. If you drag and drop it however, the code can send the real valid uri so that firefox can load it correctly. So, this introduces two kinds of of URIs that are "mostly similar" but breaks in many cases. This is very unfortunate, and imho not acceptable. I think its ok to accept a URI typed in like "ftp://ftp.gnome.org/dir/a åäö.txt" and convert it to the right uri, but its not right to then display such a uri in the nautilus location bar, as that can result in that invalid uri getting into other places. Since I dislike showing invalid URIs in the UI I think it makes sense to create a new absolute pathname display and entry format. Ideally such a system should allow any ascii or utf8 local filename to be represented as itself. Furthermore it would allow input of URIs, but immediately convert them to the display format (similar to how inputing a file:// uri in nautilus displays as a normal filename). One solution would be to use some other prefix than / for non-local files, and to use some form of escaping only for non-utf8 chars and non-printables. This works since we only handle absolute pathnames, so anything not starting with / is out of band. Here is an example we could use: A 1) /etc/passwd 2) /tmp/a åäö.txt 3) /tmp/a \xE5\xE4\xF6.txt 4) /tmp/bad:\x01\x08\x09\x0A\x0B 5) /mnt/cdrom 6) :ftp:ftp.gnome.org/pub/sources 7) :ftp:ftp.gnome.org/dir/a åäö.txt 8) :ftp:ftp.gnome.org/dir/a \xE5\xE4\xF6.txt 9) :ftp:ftp.gnome.org/dir/a åäö.txt 10)/tmp/foo.desktop Under the hood this would use proper, valid escaped URIs. However, we would display things in the UI that made some sense to users, only falling back to escaping in the last possible case. The API could look something like: GFile *g_file_new_from_filename (char *filename); GFile *g_file_new_from_uri (char *uri); GFile *g_file_parse_display_name (char *display_name); Another approach (mentioned by Jürg Billeter on irc yesterday) is to move from a pure textual representation of the full uri to a more structured UI. For example the ftp://ftp.gnome.org/ part of the URI could be converted to a single item in the entry looking like [#ftp.gnome.org] (where # is an ftp icon). Then the rest of the entry would edit just the path on the ftp server, as a local filename. The disadvantage here is that its a bit harder to know how to type in a full pathname including what method to use and what server (you'd type in a URI). This isn't necessarily a huge problem if you rarely type in remote URIs (you can follow links, browse the network, use favourites, etc). I don't know how hard this is to implement from a Gtk+ perspective though. Its somewhat similar to what the evolution address entry does, maybe the evolution hackers can give us some input here. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc [EMAIL PROTECTED] [EMAIL PROTECTED] He's a Nobel prize-winning shark-wrestling hairdresser who hides his scarred face behind a mask. She's a brilliant winged socialite in the wrong place at the wrong time. They fight crime! _______________________________________________ gnome-vfs-list mailing list gnome-vfs-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-vfs-list