What is the canonical way to encode filenames, both in the API and in the underlying FSFS in a Subversion 1.7 repository?

Let's say I have the file "a b.txt", which consists of "a" and "b" with a space in between. How should this be stored on the server? How should the various APIs give it to me?

Let me explain further. If I commit a file on Windows 7 Professional 64 bit on an NTFS partition using TortoiseSVN, and then turn around and read that repository using SVNKit, the SVNDirEntry.getRelativePath() gives me "a b.txt". I don't know if on the back-end these files are being stored as "a b.txt", or if they are being stored in canonical URI form (i.e. "a%20b.txt") and SVNKit is just being "helpful" by decoding them.

From my end I'm actually starting with 100% canonically-encoded URIs to begin with. If Subversion is storing these things in decoded form on the back end, does it compensate for characters not supported by the underlying file system? So when I take my URI and I decode it just so I can save the filename the way Subversion likes, how do I know which characters to decode (those supported by the underlying file system---as if I, the client know what that is!) but which characters to leave encoded (those not supported by the underlying file system on the server)?

Maybe someone can set me straight here. I'm hoping that Subversion stores everything in correctly UTF-8 encoded and escaped URIs in the back-end and in its APIs, and that the real culprit here is SVNKit for being "helpful" and decoding the strings for me without asking. Or I suppose the other option that would work almost as well is if everything on the back-end was stored in decoded form, but some tricks are pulled so that /all/ characters are supported, regardless of the underlying file system. The case I don't want to end up in is where I have to encode some characters but not others based upon some file system implementation I don't know about on the server.

Thanks for shedding some light on this.

Garret

Reply via email to