On Tue, Aug 8, 2017 at 11:00 AM,
​
Chip Scheide <4d_o...@pghrepository.org> wrote:

>
> Worse, I've found that the same product, from the same vendor in
> differing purchase amounts (1 vs case) is the same part number, but
> different pricing! So.. even a check on part numbers is insufficient to
> stop duplicate entries.
>
> ​Well I believe there are "dupes" and then there are dupes. There are
unacceptable dupes and there are dupes you have to or need to live with.
And then there are "near dupes". Like your example above. That is not
"really" a dupe because there is one field/col, pack size, that is NOT
duplicated (case vs 1). On and on.

This all started with David making a broad blanket statement about "data
integrity" and "row duplication" and how using "synthetic" record ID keys
ruined the ability to automatically​

​filter out "dupes". (I _think_ that was your point David. Please correct
me if not.) And in that strict sense, if a "row" is really actually
absolutely duplicated, that is _probably_ bad. Or maybe not if you didn't
include that "pack size" field that would have changed the row to unique.
Or all the other examples cited on this thread about duplicate names that
were not _really_ duplications; they just needed a little more information
included in the "row" to better define it. In fact we all used to
experience it right here nearly daily with our Walt Nelson(Guam) vs Walt
Nelson(Seattle) signatures. Constantly confusing without that one little
added tidbit.

So my point was, ​it all depends. And sometimes you have to design your own
system differently or provide tools within your current system to suss out
what, how, why and when a "dupe" occurred. And how to - or IF to - fix it
or prevent it or even find it.

I too am definitely a convert to using
 UUID over longint
​. AND in using them in preference to some construction using row data
itself which may well change as the business grows/changes and will NOT
play well when you absorb new data sources (buy a competitor for example
with nearly identical inventory items or combine existing standalone data
installations into one big common enterprise bucket, or decide for all
kinds of business reasons to extract a certain batch and combine it with
another batch in a different bucket, etc.etc.etc. Data duplication of one
sort or another are bound to occur in many of these "growth" scenarios and
more often than not the merging and cleaning of those dupes is not
reducible to some sort of algorithm without human hands to help. Or
whatever.

As ​
Neil succinctly describe
​d
:

 - UUID is faster (do to "random" data in the index)
>  - UUID solves problems with distributed systems that sync
>  - UUID fixes the home grown sequence problem with transactions
>  - UUID is not easily readable by human and keeps me from being tempted to
> expose them :)
>
> ​And to Chip's point, I DO sometimes expose those UUIDs as read only info
on certain Admin Review pages. I sometimes place a button to "copy to
pasteboard" if it is appropriate that the admin might desire to do some
searching with that UUID - for as we all know​, they are hellishly
difficult to type. And in many projects I retain that seq longint idea
because it really IS a useful human marker that is easier than a date:time
stamp to read and quickly sort on and in general "glom" as you scan down
long lists of rows. But I've been burned way too many times to ever use it
again as a unique recordID. I now consider it a user interface type aid
only, still useful as a ProductID or some such in many cases. But not as a
Unique-Unvarying-Forever-Regardless-Of-Source-Or-Destination-Record-Key.

Steve Simpson
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to