We don't hash the file contents during library scanning. For context, the
hash Steven is referring to is a per-directory hash that is used to
determine whether the directory should be rescanned. The way it works today
is that it concatenates all the names of files/dirs in the directory and
hashes that. If the hash is different from what we saw before we rescan the
directory.

Steven is right that we should switch to something order independent. I've
also wanted to include mtimes in there for a while but not sure if that
would result in an extra stat or not on all the common filesystems we care
about.
On Jun 9, 2013 8:53 AM, "petah" <mi...@djpetah.com> wrote:

> On Wed, 5 Jun 2013 22:46:57 -0700 (PDT)
> Steven Boswell II <ulatekh-/e1597as9lqavxtiumw...@public.gmane.org> wrote:
> > Right now I'm rescanning my track collection on my laptop, and I
> expected it to be a short operation, but it's taking forever.  I'm guessing
> it's because I moved a large part of my track collection to a different
> partition on my hard drive, even though the path remained the same.  The
> issue is that QDirIterator doesn't present the tracks in any particular
> order; it'll be dictated by how they happened to be stored in the
> underlying filesystem.  When I moved the track collection, that permuted
> the files in each directory.
> >
> > The issue is that the directory-hash calculation isn't
> order-independent; it calculates the directory hash by appending all the
> filenames to one string.  If we changed it to calculate the hash of each
> filename individually, and XORed those together, then it would be
> order-independent, and shouldn't detect less changes than the previous
> method.
> >
> > Thoughts?
>
> I'm guessing it should be seen as a generic, recursive hash(files[n]) ->
> hash(parent_dir) problem, in which case using XOR wouldn't report a file's
> movement to a grandparent because ((a ^ b) ^ c)) == (a ^ (b ^ c)) and
> surely the point is to do a top-down traversal and actually detect such
> movements?
>
> My first instinct would be to wrap QDirIterator with a qsort() so it's
> deterministic and doesn't fluctuate across different OSes/Filesystems, then
> it's an optimization problem but not trivial.
>
> I read a paper a while back that used MD6 hashes because they're immune to
> order and can be massively parallelized on GPUs, and used a header/footer
> block hash for faster discrimination. It was for file deduplication though,
> in Mixxx's case I don't see how you can avoid having to recompute the hash
> of a file that suddenly popped up; you can never be certain it was the same
> file that disappeared from another location until a full rehash... unless
> the move operation is done from within the Mixxx UI or the hash is embedded
> in the file's metadata.
>
> -- p
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Get Mixxx, the #1 Free MP3 DJ Mixing software Today
> http://mixxx.org
>
>
> Mixxx-devel mailing list
> Mixxx-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mixxx-devel
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Get Mixxx, the #1 Free MP3 DJ Mixing software Today
http://mixxx.org


Mixxx-devel mailing list
Mixxx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mixxx-devel

Reply via email to