I've done something similar, but I use a single field for the related items; like this:
id: 1 related: 2 3 9 100 category: xxx id: 2 related: 1 9 88 category: uuu xxx etc... You can limit the related items to the top N if you sort them by score when indexing and truncate the list. I don't model the relative scores in Lucene but you could do that in a gross way by repeating terms -Mike On Friday, April 25, 2014 6:09:13 PM UTC-4, NM wrote: > > > I have N documents containing attributes. > > I needed to precompute a special similarity measure between each pairwise > of documents. > > Now I would to understand how to index and search using ES to answer a > query like > > "Retrieve me the Top N documents that are the most similar to document > ID 1 and having as fieldA = 1" > and facets the results according to a given field > > -- > > I was thinking to create an index of documnts with all the associated > pairwises as attributes,like: > > Doc > id: 1 > field1: 7 > field2: 10 > sim_doc_id2: 10 > sim_doc_id3: 8 > sim_doc_id4: 12 > ... > sim_doc_idN: 12 > > Doc > id: 2 > field1: 5 > field2: 2 > sim_doc_id1: 10 > sim_doc_id3: 3 > sim_doc_id4: 2 > ... > sim_doc_idN: 10 > .. > > issue with such design > The number of generated fields per document is very large for me (10K) > and I am not sure how to search efficiently (I tried a script score > like return doc['sim_doc_id1'] + field1) but it was quite slow.. > especially compared to a stupid loop in Java. however I would to use the > aggregation framework of ES to create facets of the results. > > Do you have any recommendation / guideline to handle this pb? > > Thanks > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da3b3180-4d62-4119-87f0-6a415f72f0d0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
