I see, yeah, we all agreed that it was better to keep the `users` as it was, and add another entity to represent `unified identity` couple days back.

But it have caused mess during multiple discussions, many of us can't even express himself including myself. so we gave up and agreed that it is better to rename existing `users` to `accounts` for greater good.

The terms you defined, I think it would cause a much much bigger mess for us to express our thoughts, especially myself... -_-!!!

Correct me if I'm wrong, By your definition, a `account` might have multiple `users` on one or multiple `platforms`.

This is the opposite of my cognition: a `user` might have multiple `accounts` on one or multiple `platforms`.

Another reason why we wanted to avoid using `user` is sometimes it refers to the ones using Apache DevLake.

Does it make sense?


Thanks

Klesh Wong

On 6/15/22 21:14, Jinglei Ren wrote:
The bad smell comes from “a living thing” which the system should not model.

We can follow most of your model but (1) merge `person` and `user` in your 
model and name it `account`; (2) rename the `account` in your model to `user`.

The reason for (2) is that, as mentioned in 
https://github.com/apache/incubator-devlake/issues/1680, “we thought of 
changing the existing table.users to table.accounts and adding a table.users to 
represent … natural people, but that will cause many changes in the code.” So, 
it is good to keep the word `user` for various platforms rather than introduce 
the `account` in your model.

All in all, we can use the new `account` concept and rephrase your model.

1. `account`: the unified identity on Apache DevLake for collecting and 
analyzing data from different platforms.
2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain 
(git repository, … and the only reliable identity for a git user is email)
3. `user`: a registration record to represent a user on a `platform`, but an 
`account` may or may not map to multiple `users` on a specific platform.
   (1) any `account` is always associated with a single user on a single 
platform (we don't need `account` table)
   (2) some `account` is associated with one user on each of multiple platforms 
(we need `account` table)
   (3) some `account` is associated with multiple users on multiple platforms 
(we need `account` table badly)
Now, what we try to do here is to group those `users` by `account`… (take git 
author_email as
an example, different emails can belong to one `account`).

You can see the refined model is simpler than your original one. So, to quickly 
form consensus, the decision point can be like this: (1) If the above refined 
model meets the requirements, my understanding should be correct and my 
irritation with `person` actually leads to better definitions. Then let’s go 
with it and we won’t spend more time on the word choice of `account`, for 
example. (2) If the above refined model doesn’t work or misses something, my 
understanding should be flawed so please just keep to your original model and 
`person` and ignore this thread.

Thanks,
Jinglei

From: Klesh Wong <kl...@apache.org>
Date: Wednesday, June 15, 2022 at 2:30 PM
To: dev@devlake.apache.org <dev@devlake.apache.org>
Subject: Re: [discuss] team entity design => table name
Let's bare with existing terms a little bit longer, I don't buy your
definition of `account` just yet. Here is why:

  1. `person`: a Living Thing (Human, Dog, or Alien)
  2. `user`: a `person` who is using Apache DevLake to collect and
     analyze DevOps data
  3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
     domain(git repository, it can be cloned to different
     machines/websites, but somehow we treat them the same git repo, and
     the only reliable identity for `person` is email)
  4. `account`: a registration record to represent a `person` on a
     `platform`, but a `person` may or may not have multiple `accounts`
     on a specific platform.
      1. one `person` register on one platform one time and use it
         forever (we don't need `person` table)
      2. one `person` register on multiple platforms one time each and
         use them forever (we need `person` table)
      3. one `person` register on multiple platform multiple time each
         and use some of them (we need `person` table badly)

Now, what we try to do here is to group those `accounts` by `person`,
thus, "introduced `person`", and we don't have enough clues to figure
out who is who across multiple platforms, even worst, we can't even
figure out who is who for a specific platform (take git author_email as
an example, different email can belong to one `person`).

So, most of us agreed the best way to solve the problem is to aggregate
all those accounts from different platforms into one table named
`accounts`, and then, let `user` connect them to `persons`

Hope that explains the situation here.


Ok, would you mind explaining your idea of how to address the problem by
using only a single table?


Thanks

Klesh Wong

On 6/15/22 10:18, Jinglei Ren wrote:
I am changing the email title to branch out and avoid distracting your main 
thread. Right, this is not a big deal, so let’s conclude quickly.

You know, ambiguity can only be resolved by defining the concepts. Otherwise, 
`persons` do not help either. What I proposed was to just define `accounts` as 
your previous concept of persons or unified users. The example in your last 
email was a wrong use of the concept (such as in “we introduce `people` or 
`persons` or `unified users` to link those `accounts` together” – you still 
used `account` to refer to Git emails or duplicate Git users.).

Now let’s switch to the new definition of account. Then there can be two ways 
to handle a new commit email: (1) we can directly create a new account for it 
and then later merge it to another account if it is duplicate; (2) the commit 
emails are just modeled as `emails` or not linked to any account, and they are 
linked to accounts whenever they can.

Thanks,
Jinglei

From: Klesh Wong<kl...@apache.org>
Date: Tuesday, June 14, 2022 at 11:52 PM
To:dev@devlake.apache.org  <dev@devlake.apache.org>
Subject: Re: [discuss] team entity design
I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.

As of `table.accounts`, I don't understand, how can it represents
`unified users` while it representing multiple accounts?

For example, we are collecting `commits` data by `gitextractor`, in
order to associate a specific `commit` to a specific account, what we
can do is creating an `account` with `commit.author_email` as PK.  But,
one might create commits with different email addresses, so we introduce
`people` or `persons` or `unified users` to link those `accounts` together.

Thanks,

Klesh Wong

On 6/14/22 21:27, Jinglei Ren wrote:
Just a comment: `people` should better be `persons` to make it consistent with 
other plural names as well as `person_teams`, etc.

I see the reasons for this name, but I am still against `people` or `persons` 
because our system should not model natural persons at all. In some sense, it 
cannot because you never know if it is a person or a dog :p The key point is 
that we should consider the concept itself, not just convenience of use.

So, why not keep all types of user names as they are from different data 
sources and just add `table.accounts` to represent the standard/unified users?

Thanks,
Jinglei

From: Klesh Wong<kl...@apache.org>
Date: Monday, June 13, 2022 at 10:24 PM
To:dev@devlake.apache.org  <dev@devlake.apache.org>
Subject: [discuss] team entity design
    I meant to post the proposals of Team Entity Design to this mailing
list, but too much graphical / table and code involved. So I posted it
on
https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
instead.

     I suggest that every take a look, and either vote for whichever you
like or propose your solution.


Notice we have 2 TOPICS to decide:

    1. How to aggregate commits by Natural Person, which is prefixed by
       `proposal 1.x`
    2. What should be the Primary Key of the `people` table, which is
       prefixed by `proposal 2.x`

Please reply this email with your favorite proposal options, like:


+1 proposal 1.1

+1 proposal 2.1


PICK ONE OPTION FOR EACH TOPIC

or, post your thoughts.


Thanks


Klesh Wong

Reply via email to