> This all started with David making a broad blanket statement about "data
> integrity" and "row duplication" and how using "synthetic" record ID keys
> ruined the ability to automatically​ ​filter out "dupes". (I _think_ that
> was your point David. Please correct me if not.) And in that strict sense,
> if a "row" is really actually

I've been a bit taken aback at how this thread has gone, to be honest.

My point was really basic: Bolding on a random number field makes a row
unique, it doesn't make the underlying *data* unique.

That's it. That's more-or-less me using one of my standard techniques:

1) Find the obvious.
2) Draw a circle around it.
3) Declare it mine.

Patent pending...

Seriously, I'm saying something that just shouldn't be even the tiniest bit
contentious. That point is nothing but an established principle from
database design, it's not something I'd normally consider debatable. I'd
put it in the same category as when someone says, "Race condition? I've
figured out a shortcut." No you haven't. That's pretty much in line with
things the patent office won't even review:

* Perpetual motion machines.

* Squirrel-proof bird feeders.

Until such time as the fundamental laws of nature change, neither of those
inventions can work.

A lot of the stuff about duplicates doesn't bear on the design question
about the table's columns.

Keller 123
Keller 123
Keler 123

How many rooms are there? You have no way to know. It could be 1, 2, or 3:

1: You've got a pure duplicate and a typo-based duplicate.
2: You've got a pure duplicate and two buildings with very similar names.
3: You've got two buildings with very similar names *and you are missing
enough data to identify rows uniquely.*

In that last case,there's a design question. If you haven't identified
enough columns of *data* from the real world to uniquely identify each row,
you don't have a data model that can be implemented in a relational
database. I mean, sure, you can use a tool to store the data - but I don't
see how you model that. What makes each row unique? I mean out in the real
world. Perhaps there's a floor, or a direction, or some kind of vernacular
detail that actual people use to distinguish rooms. In which case, you've
found a missing field! That's part of the point of modeling. But what if
there really is nothing? Well, then it's pretty common to add something,
like a number. But that then (generally) needs to flow out into the real
world. So, you synthesize data in the database, add it to the row *and* to
the real world. Serial numbers are a perfect example of this. Same with
account numbers, customer numbers, etc. And UUIDs are functionally serial
numbers. They have nothing at all to do with the data in the row, they're
purely an implementation-level convenience.

As some people seem to be getting the impression I'm against UUIDs, that's
incorrect. I'm glad they're native (I used them sooner than that), I've got
nothing against them. But they are what they are and not more. Any
complaints I had about how they were implemented in 4D is irrelevant now as
it's been since V14. Too old to matter. I also have been getting the
impression when people say "primary key", they don't always appreciate what
a "key" really is - an attribute or set of attributes that are entirely
about the row of data that also uniquely identify a row. That's a key...off
the top of my head, I'm sure there's a better summary out there. The 4D
world has long had a peculiarly hostile relationship to normalization,
starting with 4D itself (subtables, wrong examples in the manuals, etc.)
I've seen the same cropping up again with how object fields are being
promoted. (Slapping some JSON in a text field and calling your system NoSQL
isn't necessarily helpful...)

But just slapping a random-but-unique number (sequential or not) onto the
row? That easily masks real duplicates. And why other databases prominently
support multi-field constraints on tables - for precisely this reason.
Again, you *can* do this in 4D with a compound index set to unique. So, the
feature exists, I just haven't seen it being used widely. Perhaps it
because engine-level errors feel disproportionately painful to deal with in
4D? That could just be me. Sincerely, I may be more allergic to 4D
engine-level errors than others. I pretty much turn off the unique
attribute and check uniqueness myself. Hmmm. I might be missing upping my
game here. I kind of remember that years ago hitting a duplicate error
wasn't easy to trap for and got ugly. I can't swear that's true, I just
remember having that impression. Long story shorter, I probably
haven't tried 4D's uniqueness controls in years. Do people use them? How
are they in V16? Since I'm doing some Postgres stuff these days, yeah, over
there I set up multi-column constraints, no question.

There are lots of other kinds of duplicates out there that can crop up,
even if you have a good model behind your tables. Life is hard. But it's
important and helpful to keep in mind the distinction between the issues
around the design of the table and the issues around data entry. They're
not all the same.

As far as duplicate matching goes, that's a different topic and one I
enjoy, but it's a different topic.
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com

Reply via email to