Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for 
change notification.

The "UnicodeClientColumns" page has been changed by Thomas Åkesson:
http://wiki.apache.org/subversion/UnicodeClientColumns

New page:
== Unicode Composition - WC Database columns ==

This page describes one approach of implementing 
NonNormalizingUnicodeCompositionAwareness. It involves redefining and/or adding 
column(s) to wc.db. 

More work is needed in this specification. Focus is currently on 
UnicodeCollation. 

TODO: This section needs input from someone more familiar with wc-ng database 
design.

=== WC Database Columns ===

Columns of interest in wc.db:

* The repository path as stored on server: repos_path (e.g. 
"project/dir/file.txt")

* The local path from WC root to node: local_relpath (e.g. "dir/file.txt")

* The local path from WC root to node parent: parent_relpath (e.g. "dir")

All three paths are in UTF-8 but NFC/NFD is not currently specified. 
local_relpath/parent_relpath get converted from UTF-8 to whatever locale 
encoding is in use whenever they are used to access the filesystem.

Takesson: Is this conversion done on the fly every time? I am guessing this 
works because locale encoding is a reversible process , otherwise lookups in 
the database would fail?

An abstraction between the repository path and the file system path can be 
achieved by ensuring that there is a column in wc.db that contains the file 
system path in exactly the same form that the file system gives back. APIs in 
wc needs to be extended to ensure that all interaction with the file system is 
performed with the file system path.


==== Alternative 1: Redefine local_relpath and parent_relpath ====

Redefine the existing columns local_relpath and parent_relpath to contain the 
path as stored in the file system. Code that currently relies on 
local_relpath/parent_relpath being a substring of repos_path needs to be 
adjusted. E.g. a node might be considered switched when this condition is not 
met.

It would generally be desirable to use repos_path when referring to entries 
rather than local_relpath.

This alternative can be simulated using the attached script 
localrelpath2nfd.sh. This provides a Working Copy equivalent to what a checkout 
should produce if this alternative was implemented in Subversion itself (only 
local_relpath is currently adjusted by the script):
* svn co ...
* svn stat #Shows any problematic items
* localrelpath2nfd.sh
* svn stat #Should be clean apart from misperception that some items are 
switched

TODO: provide a dump file with suitable test data. 

==== Alternative 2: Introduce local_relpath_disk and parent_relpath_disk ====

New columns, local_relpath_disk and parent_relpath_disk, are added that 
contains the path as stored in the file system. These columns will be used on 
all systems to interact with the file system. Currently, the content of columns 
local_relpath and  local_relpath_disk will be identical on all file systems 
except HFS+.




=== Subcommand Changes ===

Specific changes to svn subcommands are outlined below. 

All commands that access files in the Working Copy must do so by getting the 
path from the column local_relpath/local_relpath_disk. 

TODO: Investigate which subcommands currently use local_relpath for other 
purposes than accessing the file. With alternative 1 (above), it will NOT be 
acceptable to use local_relpath for comparison/substring operations with other 
paths, e.g. repos_path.


==== Checkout/Update ====

When adding paths to the WC, determine the actual filesystem path and store 
that in local_relpath/local_relpath_disk. This is actually only required on 
OSX. How can this be done? 
* Do we get a handle back from the filesystem after creating a file/dir that 
can be queried for the path?
* Use platform dependent APIs to establish the expected path.
* Alternatively, first look for the exact same path (will find the one on most 
filesystems) then fall back to globbing with Unicode composition aware 
comparison.

TODO: Do we need to process paths that are not actually checked out due to the 
depth setting?


==== Status ====

The status subcommand incorrectly reports externals when manually adjusting 
local_relpath to match the filesystem.

TODO: Clarify if status performs string comparisons between local_relpath and 
some other path.

TODO: how does status show a file whose name changed to a value that 
canonicalizes to the same value as the original name? (is that possible?)

==== Add and mkdir ====

Since this approach does not dictate a Normalized repository storage, the add 
subcommand should not perform any normalization.

The uniqueness test should be Unicode aware to avoid a "normalized-name 
collision". This is not vital but desirable for better usability (has no effect 
on Mac OSX since it is not possible to create such collisions).

TODO: Anything else?


==== Commit ====

No specific changes expected.

TODO: Confirm.

==== Changelist ====

Changelists should use repos_path to refer to entries, unless already the case.


==== ... ====

TODO: More subcommands requiring attention?

Reply via email to