On Wed, Apr 14, 2004 at 12:14:42PM -0400, Derek Atkins was heard to remark: > [EMAIL PROTECTED] (Linas Vepstas) writes: > > > yeah, I guess ... that's OK. I dunno. maybe not. But a good baseline > > would be purely automated algo like so: > > But wait, there are multiple questions that need to be answered. > > 1) Is this semantically the same object? For example, does this > Account* and that Account* point to (semantically) the same > "Account"?
I'm not sure what you mean by 'semantically' ... Are you comparing an account to something thats a not-account ?? > I'm not convinced a distance vector is necessarily > the correct measure. Its a measure, its servicable, reasonably generic. 'correct' is a mighty big word when talking about similarity between things. > Perhaps a weighted distance vector would > be appropriate (but then again the weights would be per-object, > a "parameter weight"?). Yeah, well, the problem of how to weight things is a problem of heuristics. I think that there's a reasonable set of weights that will satisfy most human users most of the time. The 'correctness' problem, and the 'arbitrariness' of these weights is one reason to keep the concept & implementation outside of the engine. > 2) Do these semantically equivalent objects have the same or > different data in them? I don't understand what your saying. What's an example of 'semantically equivalent objects' that would have different data? Do you mean 'similar objects that have slightly different data?' For example: two transactions for the same amount, same payee, dates that differ by one? These are 'similar' in my dictionary, I don't know what you mean by 'semantically equivalent'. > This is more a question to determine > if you have any work to do once you determine that you've got > a duplicate in the import queue. I don't get this either. If you've got two transactions that are identical, you've got work to do. If they are almost nearly identical, you've got exactly the same work to do. Only if they are significantly different do you have to stop and popup a gui to ask the user. --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <[EMAIL PROTECTED]> PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933 _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] https://lists.gnucash.org/mailman/listinfo/gnucash-devel