Hi John,

I am designing a quite similar data model of cited *User-Skill*
problem, but not exactly the same.

People may not be familiar with our domain. Basically it will be a
track record system. User can perform different tasks and same task
can be done by many users. According to historical data, more than
5000 different users will finish the same task, and it is possible for
some users to finish more than 5000 tasks within the data archive
period. Additionally, following queries will be performed frequently:
1, given 2 user, A and B, find out common tasks they have done  (less
important)
2, given several tasks, find out users who have done all these tasks
(more important)

Translate our problem into user-skill scenario, in this case, one user
can have more than 5000 skills and there could be more than 5000 users
having the same skill.
1, given 2 user, A and B, how to find out their mutual skills
2, given n skills, how to find full list of users having all n skills



Best regards,
Max

On Mar 15, 5:59 pm, John Patterson <jdpatter...@gmail.com> wrote:
> I was meaning just put the UserSkills of the two people into the set.  
> Each person only has a small number of skills yeah?
>
> Perhaps I mis understood your last requirement "similarity between  
> user A and User B"
>
> On 15 Mar 2010, at 14:23, Max wrote:
>
> > Hi John,
>
> > Thanks for your reply. I need some time to study and test your codes.
>
> > For the last point, Sets.intersection() means we need to load all keys
> > into memory and perform an in memory Sets.intersection(). Is that
> > possible to do this by a query directly. In other words, is that
> > possible to use more than one equality filter on a list property of a
> > relation entity index for their parents?
>
> > Best regards,
> > Max
>
> > On Mar 15, 2:45 pm, John Patterson <jdpatter...@gmail.com> wrote:
> >> Hi Max,
>
> >> Regarding your original question, a more efficient solution would be
> >> to embed the UesrSkill in the User instance which would allow you to
> >> find all results in a single query.  Th problem is that embedded
> >> instances can only be queried on a single value.  There would be no
> >> way to query skill and ability on the same UserSkill - just "java  
> >> and c
> >> ++ with any skill over 3 and any skill over 5"
>
> >> To solve this you could create a combined property in UserSkill for
> >> querying "skillAbility" which would hold values such as "java:5", "c
> >> ++:
> >> 4".  This will only work with skill from 0-9 because it depends on
> >> lexical ordering (or e.g. 000 - 999)
>
> >> Both Twig and Objectify but not JDO support embedded collections of
> >> instances.
>
> >> In Twig it would be defined like this
>
> >> class User
> >> {
> >>         @Embed List<UserSkill> skills;
>
> >> }
>
> >> class UserSkill
> >> {
> >>         String skillAbility;
> >>         Skill skill;    // direct reference to Skill instance
> >>         int ability;
>
> >> }
>
> >> Disclaimer: I have not tried any of this code - it is just off the  
> >> top
> >> of my head
>
> >> You would then do a single range query to find "java-5", "java-6,
> >> "java-7"...
>
> >> // find java developers with ability over 5 in a single query
> >> datastore.find().type(User.class)
> >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >> 5")  // range
> >> start
> >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >> Character.MAX_VALUE)  // range end
> >>         .returnResultsNow();
>
> >> But that doesn't fully answer your question which includes an AND on
> >> multiple property values which is not supported by the datastore.  To
> >> do this you will need to perform two queries and merge the results.
>
> >> Twig has support for merging only OR queries right now so you can do:
>
> >> // find users with c++ ability > 2 OR java ability > 5
>
> >> RootFindCommand  or = datastore.find().type(User.class);  // default
> >> (only) operator is OR
>
> >> or.addChildCommand()
> >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >> 5")  // range
> >> start
> >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >> Character.MAX_VALUE);  // range end
>
> >> or.addChildCommand()
> >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >> 5")  // range
> >> start
> >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >> Character.MAX_VALUE);  // end
>
> >> // merges results from both queries into a single iterator
> >> Iterator<User> results = or.returnResultsNow();
>
> >> Supporting AND merges is coming!  Add a feature request if you like.
> >> But for now you will have to do two separate queries as in the first
> >> example and join the results in your own code.  You should make sure
> >> both queries are sorted by key then you can "stream" the results
> >> without loading them all into memory at once.
>
> >> // find java developers with ability over 5
> >> datastore.find().type(User.class)
> >>         .addSort("skillAbility")      // first sort required to be  
> >> inequality filter
> >>         .addSort(Entity.KEY_RESERVED_PROPERTY)  // ensure results  
> >> in same order
> >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >> 5")  // range
> >> start
> >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >> Character.MAX_VALUE)  // range end
> >>         .returnResultsNow();
>
> >> // find c++ developers with ability over 2
> >> datastore.find().type(User.class)
> >>         .addSort("skillAbility")
> >>         .addSort(Entity.KEY_RESERVED_PROPERTY)
> >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "c++:2")  //  
> >> range start
> >>         .addFilter("skillAbility", LESS_THAN, "c++:-" +
> >> Character.MAX_VALUE)  // range end
> >>         .returnResultsNow();
>
> >> // now iterate through both results and only include those in both
> >> iterators
>
> >> Again, I have not run this code so I might have made a mistake.  Let
> >> me know how you get on!  I'll be adding support for these merged AND
> >> queries on multiple property values soon - unless someone else wants
> >> to contribute it first ;)
>
> >> To find the similarity between two users is now simple now that they
> >> are just a property of the User?  just do a Sets.intersection() of  
> >> the
> >> skills.
>
> >> John
>
> >> On 15 Mar 2010, at 12:07, Max wrote:
>
> >>> Thanks John,
>
> >>> Bret Slatkins' talk is impressive. Let's say we have m skills with n
> >>> levels. (i.e., m x n SkillLevel entities). Each SkillLevel entity
> >>> consists of at least one SkillLevelIndex.
>
> >>> We define similarity between user A and User B as number of  skills
> >>> with same level. i.e., number of SkillLevel entities of query:
> >>> "from SkillLevel where userKeyList contains A and userKeyList  
> >>> contains
> >>> B"
>
> >>> It works fine if userKeyList contains all user keys. However,  
> >>> after we
> >>> applied relation index pattern, we have more than one user keys  
> >>> lists,
> >>> then how to perform a query to calculate similarity between two  
> >>> users?
>
> >>> On Mar 10, 12:21 pm, John Patterson <jdpatter...@gmail.com> wrote:
> >>>> On 10 Mar 2010, at 10:53, Max wrote:
>
> >>>>> Rusty Wright suggested a list of user keys to be stored in skill
> >>>>> entity. But that means only 5000 users can have the same skill.
>
> >>>> If you use the "Relation Index Entity" pattern as described in Bret
> >>>> Slatkins talk you can store 5000 users per index entity and an
> >>>> unlimited number of index entities.  This will actually give you  
> >>>> much
> >>>> better query performance too because you will not need to load the
> >>>> list of 5000 Keys every time you read your main entity.
>
> >>>> The new release of Twig has direct support for RIE's and can  
> >>>> actually
> >>>> let you query for multiple skills *in parallel* then merge the
> >>>> results
> >>>> and return the parents :
>
> >>>>http://code.google.com/p/twig-persist/wiki/Using#Relation_Index_Entities
>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "Google App Engine for Java" group.
> >>> To post to this group, send email to 
> >>> google-appengine-java@googlegroups.com
> >>> .
> >>> To unsubscribe from this group, send email to 
> >>> google-appengine-java+unsubscr...@googlegroups.com
> >>> .
> >>> For more options, visit this group 
> >>> athttp://groups.google.com/group/google-appengine-java?hl=en
> >>> .
>
> > --
> > You received this message because you are subscribed to the Google  
> > Groups "Google App Engine for Java" group.
> > To post to this group, send email to google-appengine-java@googlegroups.com
> > .
> > To unsubscribe from this group, send email to 
> > google-appengine-java+unsubscr...@googlegroups.com
> > .
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-appengine-java?hl=en
> > .

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to