retitle 536598 consider using a hash-directory layout or a filename map
severity 536598 wishlist

also sprach martin f krafft <madd...@debian.org> [2009.07.11.1542 +0200]:
> The problem is that rsync (or tar) fail to copy all entries in large
> directories (50,000+ entries), because apparently the directory
> index (dir_index feature of ext2/3) gets exhausted.

The problem was that the destination filesystem has a 1k block size,
since it was originally intended to be used as Maildir storage.
Theodore Tso explains in the thread [0] that the block size b (in
kilobytes) determines the size of the directory index:

n = 200,000 × b³

which is 200,000 for 1k blocks, 1.6 million for 2k blocks, and 12.8
million for 4k blocks. I don’t know where the 200,000 constant comes
from.

0. 
http://www.linux-archive.org/ext3-users/90496-ext3_dx_add_entry-directory-index-full.html

In my case, using 4k (or even just 2k) fixed the issue.

Nevertheless,

> Anyway, the problem is a function of encfs, which inflates the
> filenames. Notably, the problem occurs with block-encrypting
> filenames, *as well as* stream encryption.
> 
> Arguably, encfs might simply not be usable for this use-case, but on
> the other hand I think that it wouldn't be too hard to solve this
> problem, for instance by hashing each directory transparently.
> 
> A trivial implementation might be the following: since encrypted
> filenames seem to be made up of letters, digits, and some special
> characters, let's assume the set of possible characters is
> 26+10+6==42. It would already help if each directory had 42
> single-letter/digit subdirectories and files would be sorted into
> those accordingly.
> 
> An alternative might be to store all files in a giant 3-4-level
> directory hash structure and to maintain an (encrypted) database of
> filename -> hashed file mappings. In --reverse mode, this database
> would have to be virtual and simulated by the encfs code.

I think these two options are still worth considering, especially in
the light of #536752.

-- 
 .''`.   martin f. krafft <madd...@d.o>      Related projects:
: :'  :  proud Debian developer               http://debiansystem.info
`. `'`   http://people.debian.org/~madduck    http://vcs-pkg.org
  `-  Debian - when you have better things to do than fixing systems

Attachment: digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)

Reply via email to