jimmylao opened a new issue #31:
URL: https://github.com/apache/incubator-bluemarlin/issues/31


   process:
   
   1. build DIN model
   2. generate user profile based on his/her keyword score (interest), then 
compute similarity score among all pairs of users
   3. analyze the distribution of resultant similarity scores to see if they 
are focused in some narrow range or spread on between 0 and 1 (cosine 
similarity)
   
   
   results:
   1.   Here’s an example of first 20 user’s keyword score profile.
   
   | user_id | kw1   | kw2   | kw3   | kw4   | kw5   | kw6   | kw7   | kw8   | 
kw9   | kw10  | kw11  | kw12  | kw13  | kw14  | kw15  |
   
|---------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
   | 1       | 0.000 | 0.000 | 0.000 | 0.130 | 0.000 | 0.399 | 0.000 | 0.000 | 
0.000 | 0.612 | 0.000 | 0.000 | 0.301 | 0.458 | 0.000 |
   | 5       | 0.000 | 0.000 | 0.078 | 0.000 | 0.000 | 0.416 | 0.000 | 0.000 | 
0.366 | 0.436 | 0.384 | 0.000 | 0.189 | 0.000 | 0.541 |
   | 8       | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.563 | 0.000 | 0.000 | 
0.649 | 0.678 | 0.000 | 0.000 | 0.000 | 0.600 | 0.000 |
   | 10      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.279 | 0.000 | 0.125 | 0.000 | 0.223 | 0.000 |
   | 11      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.354 | 0.000 | 0.000 | 
0.000 | 0.000 | 0.000 | 0.162 | 0.275 | 0.000 | 0.000 |
   | 15      | 0.000 | 0.000 | 0.099 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.509 | 0.000 | 0.000 | 0.249 | 0.000 | 0.000 |
   | 22      | 0.000 | 0.000 | 0.152 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.515 | 0.000 | 0.000 | 0.000 | 0.423 | 0.000 |
   | 30      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.474 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 34      | 0.000 | 0.000 | 0.000 | 0.000 | 0.299 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.410 | 0.000 | 0.149 | 0.000 | 0.383 | 0.000 |
   | 35      | 0.000 | 0.000 | 0.145 | 0.000 | 0.000 | 0.646 | 0.000 | 0.000 | 
0.311 | 0.000 | 0.000 | 0.000 | 0.000 | 0.440 | 0.000 |
   | 37      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.423 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 39      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.496 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 41      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.327 | 0.250 | 0.000 | 
0.000 | 0.000 | 0.307 | 0.000 | 0.000 | 0.382 | 0.000 |
   | 43      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.349 | 0.000 | 0.000 | 
0.000 | 0.430 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 47      | 0.000 | 0.000 | 0.094 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.424 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 49      | 0.305 | 0.509 | 0.000 | 0.000 | 0.000 | 0.721 | 0.000 | 0.000 | 
0.000 | 0.758 | 0.000 | 0.000 | 0.000 | 0.740 | 0.000 |
   | 51      | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.415 | 0.000 | 0.000 | 0.128 | 0.000 | 0.000 | 0.000 |
   | 52      | 0.000 | 0.000 | 0.134 | 0.000 | 0.000 | 0.336 | 0.000 | 0.000 | 
0.000 | 0.446 | 0.000 | 0.090 | 0.000 | 0.415 | 0.000 |
   | 53      | 0.106 | 0.000 | 0.000 | 0.000 | 0.406 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   | 55      | 0.000 | 0.000 | 0.000 | 0.122 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000 | 0.371 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
   
   2.   Pairwise user similarity score was computed based on each user’s 
keyword score profile. Here’s a example of pairwise similarity scores based on 
keyword profile score of 1st 20 users above. It’s shown that the similarity 
score is well distributed between 0 and 1 instead of all focusing on lower end 
(0) or high end (1).
   
   |       | user1  | user2 | user3 | user4 | user5 | user6 | user7 | user8 | 
user9 | user10 | user11 | user12 | user13 | user14 | user15 | user16 | user17 | 
user18 | user19 | user20 |
   
|--------|-------|-------|-------|-------|-------|-------|-------|-------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|-------|
   | user1  | 1.000 | 0.536 | 0.794 | 0.782 | 0.510 | 0.729 | 0.807 | 0.663 | 
0.708  | 0.583  | 0.000  | 0.000  | 0.517  | 0.788  | 0.648  | 0.837  | 0.000  
| 0.906  | 0.000  | 0.674 |
   | user2  | 0.536 | 1.000 | 0.621 | 0.325 | 0.422 | 0.486 | 0.349 | 0.441 | 
0.277  | 0.466  | 0.370  | 0.370  | 0.401  | 0.607  | 0.447  | 0.451  | 0.354  
| 0.488  | 0.000  | 0.419 |
   | user3  | 0.794 | 0.621 | 1.000 | 0.684 | 0.335 | 0.481 | 0.707 | 0.543 | 
0.623  | 0.779  | 0.520  | 0.520  | 0.517  | 0.706  | 0.530  | 0.774  | 0.497  
| 0.831  | 0.000  | 0.516 |
   | user4  | 0.782 | 0.325 | 0.684 | 1.000 | 0.112 | 0.653 | 0.920 | 0.737 | 
0.884  | 0.304  | 0.000  | 0.000  | 0.352  | 0.572  | 0.720  | 0.704  | 0.097  
| 0.844  | 0.000  | 0.701 |
   | user5  | 0.510 | 0.422 | 0.335 | 0.112 | 1.000 | 0.250 | 0.000 | 0.000 | 
0.078  | 0.562  | 0.000  | 0.000  | 0.379  | 0.468  | 0.000  | 0.379  | 0.100  
| 0.393  | 0.000  | 0.000 |
   | user6  | 0.729 | 0.486 | 0.481 | 0.653 | 0.250 | 1.000 | 0.705 | 0.885 | 
0.556  | 0.029  | 0.000  | 0.000  | 0.000  | 0.687  | 0.901  | 0.475  | 0.000  
| 0.585  | 0.000  | 0.841 |
   | user7  | 0.807 | 0.349 | 0.707 | 0.920 | 0.000 | 0.705 | 1.000 | 0.753 | 
0.836  | 0.357  | 0.000  | 0.000  | 0.369  | 0.585  | 0.784  | 0.729  | 0.000  
| 0.872  | 0.000  | 0.716 |
   | user8  | 0.663 | 0.441 | 0.543 | 0.737 | 0.000 | 0.885 | 0.753 | 1.000 | 
0.628  | 0.000  | 0.000  | 0.000  | 0.000  | 0.776  | 0.976  | 0.537  | 0.000  
| 0.624  | 0.000  | 0.950 |
   | user9  | 0.708 | 0.277 | 0.623 | 0.884 | 0.078 | 0.556 | 0.836 | 0.628 | 
1.000  | 0.302  | 0.000  | 0.000  | 0.350  | 0.488  | 0.613  | 0.644  | 0.067  
| 0.761  | 0.443  | 0.597 |
   | user10 | 0.583 | 0.466 | 0.779 | 0.304 | 0.562 | 0.029 | 0.357 | 0.000 | 
0.302  | 1.000  | 0.365  | 0.365  | 0.694  | 0.477  | 0.037  | 0.656  | 0.349  
| 0.688  | 0.000  | 0.000 |
   | user11 | 0.000 | 0.370 | 0.520 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000  | 0.365  | 1.000  | 1.000  | 0.000  | 0.000  | 0.000  | 0.000  | 0.956  
| 0.000  | 0.000  | 0.000 |
   | user12 | 0.000 | 0.370 | 0.520 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.000  | 0.365  | 1.000  | 1.000  | 0.000  | 0.000  | 0.000  | 0.000  | 0.956  
| 0.000  | 0.000  | 0.000 |
   | user13 | 0.517 | 0.401 | 0.517 | 0.352 | 0.379 | 0.000 | 0.369 | 0.000 | 
0.350  | 0.694  | 0.000  | 0.000  | 1.000  | 0.322  | 0.000  | 0.573  | 0.000  
| 0.587  | 0.000  | 0.000 |
   | user14 | 0.788 | 0.607 | 0.706 | 0.572 | 0.468 | 0.687 | 0.585 | 0.776 | 
0.488  | 0.477  | 0.000  | 0.000  | 0.322  | 1.000  | 0.758  | 0.739  | 0.000  
| 0.782  | 0.000  | 0.738 |
   | user15 | 0.648 | 0.447 | 0.530 | 0.720 | 0.000 | 0.901 | 0.784 | 0.976 | 
0.613  | 0.037  | 0.000  | 0.000  | 0.000  | 0.758  | 1.000  | 0.524  | 0.000  
| 0.650  | 0.000  | 0.928 |
   | user16 | 0.837 | 0.451 | 0.774 | 0.704 | 0.379 | 0.475 | 0.729 | 0.537 | 
0.644  | 0.656  | 0.000  | 0.000  | 0.573  | 0.739  | 0.524  | 1.000  | 0.000  
| 0.880  | 0.055  | 0.510 |
   | user17 | 0.000 | 0.354 | 0.497 | 0.097 | 0.100 | 0.000 | 0.000 | 0.000 | 
0.067  | 0.349  | 0.956  | 0.956  | 0.000  | 0.000  | 0.000  | 0.000  | 1.000  
| 0.037  | 0.000  | 0.000 |
   | user18 | 0.906 | 0.488 | 0.831 | 0.844 | 0.393 | 0.585 | 0.872 | 0.624 | 
0.761  | 0.688  | 0.000  | 0.000  | 0.587  | 0.782  | 0.650  | 0.880  | 0.037  
| 1.000  | 0.000  | 0.593 |
   | user19 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 
0.443  | 0.000  | 0.000  | 0.000  | 0.000  | 0.000  | 0.000  | 0.055  | 0.000  
| 0.000  | 1.000  | 0.000 |
   | user20 | 0.674 | 0.419 | 0.516 | 0.701 | 0.000 | 0.841 | 0.716 | 0.950 | 
0.597  | 0.000  | 0.000  | 0.000  | 0.000  | 0.738  | 0.928  | 0.510  | 0.000  
| 0.593  | 0.000  | 1.000 |
   
   3.   Computed pairwise similarity score distribution among first 20k user, 
resulting in 20,000 x 20,000 similarity score matrix (cosine similarity score 
was used), the distribution of the values in the matrix is shown below -> it’s 
almost a perfect normal distribution.
   
![image](https://user-images.githubusercontent.com/60371672/150385884-17b5b80a-6a12-4987-8072-8416fc799d34.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@bluemarlin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to