Re: Indexing a large Nx N matrix of similarity with ES

Michael Sokolov Tue, 29 Apr 2014 19:03:06 -0700


I've done something similar, but I use a single field for the related 
items; like this:


id:  1
related: 2 3 9 100
category: xxx

id: 2
related: 1 9 88
category: uuu xxx

etc...

You can limit the related items to the top N if you sort them by score 
when indexing and truncate the list.  I don't model the relative scores 
in Lucene but you could do that in a gross way by repeating terms

-Mike


On Friday, April 25, 2014 6:09:13 PM UTC-4, NM wrote:
>
>
> I have N documents containing attributes. 
>
> I  needed to precompute a special similarity measure between each pairwise 
> of documents.
>
> Now I would to understand how to index and search using ES to answer a 
> query like 
>
>  "Retrieve me  the Top N  documents that are  the most similar to document 
> ID 1 and having as fieldA = 1" 
> and facets the results according to a given field
>
> --
>
> I was thinking to create an index of documnts with all the associated 
> pairwises as attributes,like:
>
> Doc
> id: 1
> field1: 7
> field2: 10
> sim_doc_id2: 10
> sim_doc_id3: 8
> sim_doc_id4: 12
> ...
> sim_doc_idN: 12
>
> Doc
> id: 2
> field1: 5
> field2: 2
> sim_doc_id1: 10
> sim_doc_id3: 3
> sim_doc_id4: 2
> ...
> sim_doc_idN: 10
> ..
>
> issue with such design
> The number of generated fields per document is very large for me (10K)
> and I am not sure how to search  efficiently   (I tried a script score 
> like  return doc['sim_doc_id1'] + field1) but it was quite slow.. 
> especially compared to a stupid loop in Java. however I would to use the 
> aggregation framework of ES to create facets of the results.
>
> Do you have any recommendation / guideline to handle this pb?
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da3b3180-4d62-4119-87f0-6a415f72f0d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing a large Nx N matrix of similarity with ES

Reply via email to