The bad smell comes from “a living thing” which the system should not model.
We can follow most of your model but (1) merge `person` and `user` in your
model and name it `account`; (2) rename the `account` in your model to `user`.
The reason for (2) is that, as mentioned in
https://github.com/apache/incubator-devlake/issues/1680, “we thought of
changing the existing table.users to table.accounts and adding a table.users to
represent … natural people, but that will cause many changes in the code.” So,
it is good to keep the word `user` for various platforms rather than introduce
the `account` in your model.
All in all, we can use the new `account` concept and rephrase your model.
1. `account`: the unified identity on Apache DevLake for collecting and
analyzing data from different platforms.
2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain
(git repository, … and the only reliable identity for a git user is email)
3. `user`: a registration record to represent a user on a `platform`, but an
`account` may or may not map to multiple `users` on a specific platform.
(1) any `account` is always associated with a single user on a single
platform (we don't need `account` table)
(2) some `account` is associated with one user on each of multiple platforms
(we need `account` table)
(3) some `account` is associated with multiple users on multiple platforms
(we need `account` table badly)
Now, what we try to do here is to group those `users` by `account`… (take git
author_email as
an example, different emails can belong to one `account`).
You can see the refined model is simpler than your original one. So, to quickly
form consensus, the decision point can be like this: (1) If the above refined
model meets the requirements, my understanding should be correct and my
irritation with `person` actually leads to better definitions. Then let’s go
with it and we won’t spend more time on the word choice of `account`, for
example. (2) If the above refined model doesn’t work or misses something, my
understanding should be flawed so please just keep to your original model and
`person` and ignore this thread.
Thanks,
Jinglei
From: Klesh Wong <kl...@apache.org>
Date: Wednesday, June 15, 2022 at 2:30 PM
To: dev@devlake.apache.org <dev@devlake.apache.org>
Subject: Re: [discuss] team entity design => table name
Let's bare with existing terms a little bit longer, I don't buy your
definition of `account` just yet. Here is why:
1. `person`: a Living Thing (Human, Dog, or Alien)
2. `user`: a `person` who is using Apache DevLake to collect and
analyze DevOps data
3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
domain(git repository, it can be cloned to different
machines/websites, but somehow we treat them the same git repo, and
the only reliable identity for `person` is email)
4. `account`: a registration record to represent a `person` on a
`platform`, but a `person` may or may not have multiple `accounts`
on a specific platform.
1. one `person` register on one platform one time and use it
forever (we don't need `person` table)
2. one `person` register on multiple platforms one time each and
use them forever (we need `person` table)
3. one `person` register on multiple platform multiple time each
and use some of them (we need `person` table badly)
Now, what we try to do here is to group those `accounts` by `person`,
thus, "introduced `person`", and we don't have enough clues to figure
out who is who across multiple platforms, even worst, we can't even
figure out who is who for a specific platform (take git author_email as
an example, different email can belong to one `person`).
So, most of us agreed the best way to solve the problem is to aggregate
all those accounts from different platforms into one table named
`accounts`, and then, let `user` connect them to `persons`
Hope that explains the situation here.
Ok, would you mind explaining your idea of how to address the problem by
using only a single table?
Thanks
Klesh Wong
On 6/15/22 10:18, Jinglei Ren wrote:
I am changing the email title to branch out and avoid distracting your main
thread. Right, this is not a big deal, so let’s conclude quickly.
You know, ambiguity can only be resolved by defining the concepts. Otherwise,
`persons` do not help either. What I proposed was to just define `accounts` as
your previous concept of persons or unified users. The example in your last
email was a wrong use of the concept (such as in “we introduce `people` or
`persons` or `unified users` to link those `accounts` together” – you still
used `account` to refer to Git emails or duplicate Git users.).
Now let’s switch to the new definition of account. Then there can be two ways
to handle a new commit email: (1) we can directly create a new account for it
and then later merge it to another account if it is duplicate; (2) the commit
emails are just modeled as `emails` or not linked to any account, and they are
linked to accounts whenever they can.
Thanks,
Jinglei
From: Klesh Wong<kl...@apache.org>
Date: Tuesday, June 14, 2022 at 11:52 PM
To:dev@devlake.apache.org <dev@devlake.apache.org>
Subject: Re: [discuss] team entity design
I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
As of `table.accounts`, I don't understand, how can it represents
`unified users` while it representing multiple accounts?
For example, we are collecting `commits` data by `gitextractor`, in
order to associate a specific `commit` to a specific account, what we
can do is creating an `account` with `commit.author_email` as PK. But,
one might create commits with different email addresses, so we introduce
`people` or `persons` or `unified users` to link those `accounts` together.
Thanks,
Klesh Wong
On 6/14/22 21:27, Jinglei Ren wrote:
Just a comment: `people` should better be `persons` to make it consistent with
other plural names as well as `person_teams`, etc.
I see the reasons for this name, but I am still against `people` or `persons`
because our system should not model natural persons at all. In some sense, it
cannot because you never know if it is a person or a dog :p The key point is
that we should consider the concept itself, not just convenience of use.
So, why not keep all types of user names as they are from different data
sources and just add `table.accounts` to represent the standard/unified users?
Thanks,
Jinglei
From: Klesh Wong<kl...@apache.org>
Date: Monday, June 13, 2022 at 10:24 PM
To:dev@devlake.apache.org <dev@devlake.apache.org>
Subject: [discuss] team entity design
I meant to post the proposals of Team Entity Design to this mailing
list, but too much graphical / table and code involved. So I posted it
on
https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
instead.
I suggest that every take a look, and either vote for whichever you
like or propose your solution.
Notice we have 2 TOPICS to decide:
1. How to aggregate commits by Natural Person, which is prefixed by
`proposal 1.x`
2. What should be the Primary Key of the `people` table, which is
prefixed by `proposal 2.x`
Please reply this email with your favorite proposal options, like:
+1 proposal 1.1
+1 proposal 2.1
PICK ONE OPTION FOR EACH TOPIC
or, post your thoughts.
Thanks
Klesh Wong