On Tue, Aug 8, 2017 at 11:00 AM, Chip Scheide <4d_o...@pghrepository.org> wrote:
> > Worse, I've found that the same product, from the same vendor in > differing purchase amounts (1 vs case) is the same part number, but > different pricing! So.. even a check on part numbers is insufficient to > stop duplicate entries. > > Well I believe there are "dupes" and then there are dupes. There are unacceptable dupes and there are dupes you have to or need to live with. And then there are "near dupes". Like your example above. That is not "really" a dupe because there is one field/col, pack size, that is NOT duplicated (case vs 1). On and on. This all started with David making a broad blanket statement about "data integrity" and "row duplication" and how using "synthetic" record ID keys ruined the ability to automatically filter out "dupes". (I _think_ that was your point David. Please correct me if not.) And in that strict sense, if a "row" is really actually absolutely duplicated, that is _probably_ bad. Or maybe not if you didn't include that "pack size" field that would have changed the row to unique. Or all the other examples cited on this thread about duplicate names that were not _really_ duplications; they just needed a little more information included in the "row" to better define it. In fact we all used to experience it right here nearly daily with our Walt Nelson(Guam) vs Walt Nelson(Seattle) signatures. Constantly confusing without that one little added tidbit. So my point was, it all depends. And sometimes you have to design your own system differently or provide tools within your current system to suss out what, how, why and when a "dupe" occurred. And how to - or IF to - fix it or prevent it or even find it. I too am definitely a convert to using UUID over longint . AND in using them in preference to some construction using row data itself which may well change as the business grows/changes and will NOT play well when you absorb new data sources (buy a competitor for example with nearly identical inventory items or combine existing standalone data installations into one big common enterprise bucket, or decide for all kinds of business reasons to extract a certain batch and combine it with another batch in a different bucket, etc.etc.etc. Data duplication of one sort or another are bound to occur in many of these "growth" scenarios and more often than not the merging and cleaning of those dupes is not reducible to some sort of algorithm without human hands to help. Or whatever. As Neil succinctly describe d : - UUID is faster (do to "random" data in the index) > - UUID solves problems with distributed systems that sync > - UUID fixes the home grown sequence problem with transactions > - UUID is not easily readable by human and keeps me from being tempted to > expose them :) > > And to Chip's point, I DO sometimes expose those UUIDs as read only info on certain Admin Review pages. I sometimes place a button to "copy to pasteboard" if it is appropriate that the admin might desire to do some searching with that UUID - for as we all know, they are hellishly difficult to type. And in many projects I retain that seq longint idea because it really IS a useful human marker that is easier than a date:time stamp to read and quickly sort on and in general "glom" as you scan down long lists of rows. But I've been burned way too many times to ever use it again as a unique recordID. I now consider it a user interface type aid only, still useful as a ProductID or some such in many cases. But not as a Unique-Unvarying-Forever-Regardless-Of-Source-Or-Destination-Record-Key. Steve Simpson ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************