On Tue, Apr 05, 2005 at 03:34:25PM +0200, Jan Hudec wrote: > On Tue, Apr 05, 2005 at 11:03:28 +0200, [EMAIL PROTECTED] wrote:
[...] > > Yup. This metadata business doesn't solve everything. Especially with > > The metadata business does not solve *anything*. Only think it can do is > prepare ground for actual solutions. And there is no point in > implementing it until you know what the solutions will actually use. Well put. [...] > Yes. But "metadata" is two broad a term to be useful. We need to tell > more about how it should behave. That would be the point of such discussions. The question I am pondering about is ``is it possible to provide a mechanism which is generic and simple enough to be worth it and leave policy to the individual instances/ users/whatever? Or would we be just sliding the knot from here to there, uselessly?�� > Eg. many formats can be detected by some kind of magic number. And there > a metadatum saying "files with magic numbers X should get treating Y" is > a lot more useful than listing those files...! A kind of `file� heuristics. And what do you do about changing heuristics? > > Inexact patching for jpeg images anyone? Or for (ugh!) XML RSS files[1]? > > Or... > > I believe there is some kind of xml-diff, that compares the trees, not > the text ;-). One such is built into openoffice... As has been said elsewhere -- inexact patching won't work very satisfactorily. The reason, I think is that inexact patching is closely tied to semantics. And your classical `hand-edited� source file bears (by sheer coincidence) a semantical dimension in the fact that disjoint lines are loosely coupled. > I fear inexact patching of jpegs won't work, because they are lossy. But > pngs... And for eg. xcf (Gimp format), I can even imagine a _useful_ > one... That depends on what you understand by `work�. Do the results just have to `look similar� (which, roughly speaking is the equivalence relation defining `a jpeg image�). > And even if it's not inexact patching, instead of two versions, you can > store one version and a difference. And knowing the nature of the data > can make this more efficient. This is a completely different dimension. One of the things which confused me at the beginning is that in Arch (or in CVS, SVN), the cute diff+patch trick is being used for different things: - storage efficiency. This is the most visble, but also the least important. Binary diffs or whatever work here as well. - merge related but different changes, i.e. inexact patching. This is the really cool thing about version management, but it has a semantic dimension. You can't throw it at any kind of file. Nowadays it just kind-of-works for your good-old- plain-source-file. [about file ids and metadata] > Hm, it's not that easy. There will have to be a set of data types > (content, type, permissions, ...) that will be recorded for each file, > and a set of procedures to diff and patch them. This set should be > extensible and might be different on each platform. However, all the > standard mapping would have to be built in. > > Note, that for security reasons arch must not run archive-provided > scripts, so the diff algorithm specification has to be flexible enough > to be actually useful. *This* could turn out to be a real killer. I think it'd be difficult to get things right in the first place. As your attribute set evolves (along the archive's life) your attribute-related algorithms might want to evolve too. How does one solve that? Regards -- tom�s
pgp9GMU5EeYeI.pgp
Description: PGP signature
_______________________________________________ Gnu-arch-users mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnu-arch-users GNU arch home page: http://savannah.gnu.org/projects/gnu-arch/
