Re: Help with designing our document for graphs. Indexing single nodes in graph with thousands of incoming edges

Mark Harwood Wed, 08 Oct 2014 02:30:09 -0700

If you create item-centric documents (in your case, venues) and maintain an 
exhaustive list of users who like that item then this can be a problem for 
very popular items e.g. movies etc that can be liked by large numbers of 
people. It would need constantly updating.
By contrast, a user-centric document may be more manageable - each user 
will have a finite list of the items they like held in their profile. There 
are other benefits to a user-centric model in that you can use aggregations 
to do item recommendations e.g. "people who liked item X also liked item Y".


To answer the question of testing the properties of a items liked by a user 
 (the opening times of the venues liked by dave) then you have 2 options:

1) At the point of "liking" copy the item's properties into the 
user-centric document. This may be costly to alter if opening times change 
frequently and will probably require the use of nested docs on user profiles
2) Index docs of the type "user-y-likes-item-x" and turn your query into a 
2-step operation - first retrieve all the items user X likes and then use 
this list in a filter of a query on item docs with the required opening 
times. The list of items liked by a single user is hopefully small/capped.






On Tuesday, October 7, 2014 12:24:03 AM UTC+1, Todd Nine wrote:
>
> Hi Jorg,
>   Thanks for the response.  I don't actually need to model the 
> relationship per se, more that a document is used in a relationship via a 
> filter, then search on it's properties.  See the example below for more 
> clarity.
>
>
> Restaurant: => {name: "duo"}
>
> Now, lets say I have 3 users,
>  
> George, Dave and Rod
>
> George Dave and Rod all "like" the restaurant Duo.  These are directed 
> edges from the user, of type "likes" to the "duo" document.  We store these 
> edges in Cassandra.  Envision the document looking something like this.
>
>
> { 
> name: "duo",
> openTime: 9,
> closeTime: 18
> _in_edges: [ "george/likes", "dave/likes", "rod/likes" ]
> }
>
> Then when searching, the user Dave would search something like this.
>
> select * where closeTime < 16
>
>
> Which we translate in to a query, which is then also filtered by _in_edges 
> = "dave/likes".
>
> Our goal is to only create 1 document per node in our graph (in this 
> example restaurant), then possibly use the scripting API to add and remove 
> elements to the _in_edges fields and update the document.  My only concern 
> around this is document size.  It's not clear to me how to go about this 
> when we start getting millions of edges to that same target node, or 
> _in_edges field could grow to be millions of fields long.  At that point, 
>  is it more efficient to de-normalize and just turn "dave/likes", 
> "rod/likes", and "george/likes" into document types and store multiple 
> copies?
>
> Thanks,
> Todd
>
>
>
>
>
>
>
>
> On Sat, Oct 4, 2014 at 2:52 AM, [email protected] <javascript:> <
> [email protected] <javascript:>> wrote:
>
>> Not sure if this helps but I use a variant of graphs in ES, it is called 
>> Linked Data (JSON-LD)
>>
>> By using JSON-LD, you can index something like
>>
>> doc index: graph
>> doc type: relations
>> doc id: ...
>>
>> {
>>    "user" : {
>>       "id" : "...",
>>       "label" : "Bob",
>>       "likes" : "restaurant:Duo"
>>   }
>> }
>>
>> for the statement "Bob likes restaurant Duo"
>>
>> and then you can run ES queries on the field "likes" or better 
>> "user.likes" for finding the users that like a restaurant etc. Referencing 
>> the "id" it is possible to lookup another document in another index about 
>> "Bob".
>>
>> Just to give an idea how you can model relations in structured ES JSON 
>> objects.
>>
>> Jörg
>>
>>
>> On Fri, Oct 3, 2014 at 7:59 PM, Todd Nine <[email protected] <javascript:>
>> > wrote:
>>
>>> So clearly I need to RTFM.  I missed this in the documentation the first 
>>> time.
>>>
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping.html#_how_types_are_implemented
>>>
>>> Will filters at this scale be fast enough?
>>>
>>>
>>>
>>> On Friday, October 3, 2014 11:48:40 AM UTC-6, Todd Nine wrote:
>>>>
>>>> Hey guys,
>>>>   We're currently storing entities and edges in Cassandra.  The 
>>>> entities are JSON, and edges are directed edges with a 
>>>> source---type-->target.  We're using ElasticSearch for indexing and I 
>>>> could 
>>>> really use a hand with design.
>>>>
>>>> What we're doing currently, is we take an entity, and turn it's JSON 
>>>> into a document.  We then create multiple copies of our document and 
>>>> change 
>>>> it's type to match the index.  For instance, Image the following use case.
>>>>
>>>>
>>>> bob(user) -- likes -- > Duo (restaurant)   ===> Document Type  = 
>>>> bob(user) + likes + restaurant ; bob(user) + likes
>>>>      
>>>>
>>>> bob(user) -- likes -> Root Down (restaurant)  ===> Document Type  = 
>>>> bob(user) + likes+ restaurant ; bob(user) + likes
>>>>
>>>> bob(user) -- likes --> Coconut Porter (beer). ===> Document Types = 
>>>> bob(user) + likes + beer; bob(user) + likes
>>>>
>>>>
>>>> When we index using this scheme we create 3 documents based on the 
>>>> restaurants Duo and Root Down, and the beer Coconut Porter.  We then store 
>>>> this document 2x, one for it's specific type, and one in the "all" bucket. 
>>>>  
>>>>
>>>> Essentially, the document becomes a node in the graph.  For each 
>>>> incoming directed edge, we're storing 2x documents and changing the type.  
>>>> This gives us fast seeks when we search by type, but a LOT of data bloat.  
>>>> Would it instead be more efficient to keep an array of incoming edges in 
>>>> the document, then add it to our search terms?  For instance, should we 
>>>> instead have a document like this?
>>>>
>>>>
>>>> docId: Duo(restaurant)
>>>>
>>>> edges: [ "bob(user) + likes + restaurant", "bob(user) + likes" ]
>>>>
>>>> When searching where edges = "bob(user) + likes + restaurant"?
>>>>
>>>>
>>>> I don't know internally what specifying type actually does, if it just 
>>>> treats it as as field, or if it changes the routing of the response?    In 
>>>> a social situation millions of people can be connected to any one entity, 
>>>> so we have to have a scheme that won't fall over when we get to that case.
>>>>
>>>> Any help would be greatly appreciated!
>>>>
>>>> Thanks,
>>>> Todd
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/f97c6475-f4fc-4078-b052-b497ac82dc91%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/f97c6475-f4fc-4078-b052-b497ac82dc91%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/wtKQYcpb1-A/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF0jKYVLKNV7RDjTCqsKnzjQmjZb%2BxBpkkGPa3YAHfM6A%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF0jKYVLKNV7RDjTCqsKnzjQmjZb%2BxBpkkGPa3YAHfM6A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05af17ff-e215-4823-8d12-ce83e54c50be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with designing our document for graphs. Indexing single nodes in graph with thousands of incoming edges

Reply via email to