Indexing a large Nx N matrix of similarity with ES

NM Fri, 25 Apr 2014 15:09:26 -0700

I have N documents containing attributes. 

I  needed to precompute a special similarity measure between each pairwise 
of documents.


Now I would to understand how to index and search using ES to answer a 
query like 

 "Retrieve me  the Top N  documents that are  the most similar to document 
ID 1 and having as fieldA = 1" 
and facets the results according to a given field

--

I was thinking to create an index of documnts with all the associated 
pairwises as attributes,like:

Doc
id: 1
field1: 7
field2: 10
sim_doc_id2: 10
sim_doc_id3: 8
sim_doc_id4: 12
...
sim_doc_idN: 12

Doc
id: 2
field1: 5
field2: 2
sim_doc_id1: 10
sim_doc_id3: 3
sim_doc_id4: 2
...
sim_doc_idN: 10
..

issue with such design
The number of generated fields per document is very large for me (10K)
and I am not sure how to search  efficiently   (I tried a script score like 
 return doc['sim_doc_id1'] + field1) but it was quite slow.. especially 
compared to a stupid loop in Java. however I would to use the aggregation 
framework of ES to create facets of the results.

Do you have any recommendation / guideline to handle this pb?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e5f22d6-4f0a-4739-92c8-8b2e85885a6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexing a large Nx N matrix of similarity with ES

Reply via email to