>>>>> Glynn Clements <[EMAIL PROTECTED]> writes:

 >>> In-process references could be maintained by making a copy (or hard
 >>> link) to the inventory, so that the GC treats it as "live". You
 >>> would need some kind of clean-up mechanism to handle any copies
 >>> which are left behind if a module crashes.

 >> However, having GC to process all the inventories won't be efficient
 >> (unless these are stored in a database's table with appropriate
 >> indices.)  So, I had in mind keeping a references file along with
 >> each object file.

 > Ah; if you're talking about back-references, one thing to bear in
 > mind is permissions: you can use maps from mapsets for which you only
 > have read permission, and not write permission.

        Agreed.

 > [This issue has already arisen with respect to reclass maps and the
 > reclassed_to file. That was the first GRASS bug I ever fixed.]

 > That also means that garbage collection would need to scan the entire
 > location, not just individual mapsets. Actually, re-projection can
 > span locations, so you would potentially need to scan the entire
 > database.

        OTOH, I could hardly recall a piece of software that handled the
        access to a repository which is read-only to some of its
        instances, but allows deletions for some other ones.  All the
        software that handles it well either requires a dedicated server
        to manage the whole ``database'', or relies on replication.

        It seems that the reasonable behaviour would be to make a
        back-reference if possible, and issue a warning if not.

        ... Or, since I've already mentioned replication, there're a
        couple more of solutions possible for the mapsets intented to be
        accessed read-only by many:

        * make a ``hard link'' for each of the objects in a separate
          mapset, writable by the reading party;

        * never remove an object.

        The first solution actually mimics the ``clone'' feature of
        modern DVCS (say, $ git clone produces a copy of the specified
        Git repository, where most of the files are shared by means of
        ``hard links''.)  Obviously, mirroring the mapset effectively
        solves all the problems with permissions, etc., while the design
        of the objects/ directory and the use of hard links ensure
        efficient storage.  Two points to pay special attention to are:

        * all the inventories may be copied or hardlinked at the time of
          mirroring effectively turning a read-only mapset into its
          space-efficient copy, but then there should be a way to keep
          this copy in sync with a source mapset;

        * no such mirroring is currently possible precisely due to that
          some files may be updated in place; thus, I believe the ``in
          place'' issue has to be resolved irrespective to whether the
          proposed scheme will be accepted as a whole or not.

        The second solution doesn't rely on hard links and thus may be
        appropriate for the systems lacking support for them.  It may be
        noted that the disk space occupied by the unreferenced objects
        could be reclaimed if it could be ensured that no party is
        active at the time of GC.  E. g., GC may be scheduled to be run
        as part of the OS start-up sequence.

        Furthermore, this solution may be appropriate for various other
        means of sharing files in a read-only manner.  E. g., via HTTP.

 >>> [BTW, it has been pointed out that this can reduce the maximum
 >>> number of maps per mapset, as the limit on an inode's hard link
 >>> count limits the maximum number of subdirectories, while there is
 >>> usually no fixed limit on the number of files. E.g. on Linux'
 >>> ext2fs, the maximum hard link count is 65535, so you can't have
 >>> more than 65533 subdirectories.]

 >> While the inventory scheme is free from hitting this limit.

 > OTOH, if you don't use subdirectories, you will have many more files
 > in a single directory. This can be a major performance issue on some
 > filesystems.

        This isn't really a problem, at least for the objects/ -- it
        just has to be ensured that the distribution of the names is
        sufficiently even, and then an option may be added so that the
        names are split, like:

split-at:               split-at: 4             split-at: 2, 4
objects/SD6Isoi2orPOu   objects/SD6I/soi2orPOu  objects/SD/6I/soi2orPOu
objects/IyfgXZdP3JYuu   objects/Iyfg/XZdP3JYuu  objects/Iy/fg/XZdP3JYuu
objects/xlRKohTgQKmJj   objects/xlRK/ohTgQKmJj  objects/xl/RK/ohTgQKmJj
objects/oBgUH2otF7Urb   objects/oBgU/H2otF7Urb  objects/oB/gU/H2otF7Urb
objects/CeX9zEZkdR9g5   objects/CeX9/zEZkdR9g5  objects/Ce/X9/zEZkdR9g5
objects/gnUNviMqfnTOx   objects/gnUN/viMqfnTOx  objects/gn/UN/viMqfnTOx

        The source for the evenly-distributed numbers may be a good RNG,
        or a kind of a checksum (e. g., SHA1) over the file contents.

_______________________________________________
grass-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to