Thanks Lukai for the detailed reply, - "If you query is too long, it might not very efficient in query evaluation process. " -- how does Lucene query evaluation works? Is there any document to refer to? - "you can read out payload of the match term you have stored" -- what do you mean payload of the match term? Could you show me an example?
regards, Lin On Sun, Mar 17, 2013 at 7:13 AM, lukai <lukai1...@gmail.com> wrote: > > > On Fri, Mar 15, 2013 at 10:02 PM, Lin Ma <lin...@gmail.com> wrote: > >> Hi Lukai, thanks for the detailed reply. >> >> Some more comments, >> >> - "You can try score by payload" -- what do you mean score by >> payload? Appreciate if you could provide a bit more details; >> >> Write your own query/scorer, you can read out payload of the match > term you have stored. You can implement your dot product functionality in > score function of your scorer. > >> >> - "Lucene focus on search for the default implementation" -- for >> default you mean? >> >> I mean the default query parser, query types are designed for search > application. If you query is too long, it might not very efficient in query > evaluation process. > >> >> - "For your requirement, you can do some query re-write process to >> reduce your query size" -- I think query re-write you mean rewrite "iPhone >> 5", "iPhone 4S" to "iPhone" to reduce # of queries? Or you mean something >> else? >> >> Query re-write, it really depends on your application. you can > reduce/expand your query or even change the query type according your > needs. > >> >> - >> >> regards, >> Lin >> >> >> On Sat, Mar 16, 2013 at 11:55 AM, lukai <lukai1...@gmail.com> wrote: >> >>> Different application has different requirement and resolve different >>> problem. Lucene focus on search for the default implementation. For your >>> requirement, you can do some query re-write process to reduce your query >>> size if you still want to leverage the search functionality. If you just >>> want to customize your feature value and do simple dot product calculation. >>> You can try score by payload, it might not very efficient, cuz you still >>> need to convert your query into some specified Lucene query type. But you >>> still can leverage the existing index structure, NRT, distributed search >>> support by Solr. >>> >>> When you refer to performance, it really depends on the document size, >>> term distribution of your corpus. If you have enough machine, you can just >>> try reduce document number per instance and distribute your search to >>> achieve a better performance goal. >>> >>> >>> >>> >>> On Fri, Mar 15, 2013 at 7:36 PM, Lin Ma <lin...@gmail.com> wrote: >>> >>>> Hi lukai, thanks for the reply. Do you mean WAND is a way to resolve >>>> this issue? For "native support", do you mean there is no built-in >>>> (existing ready to use externally open source) module in Lucene to >>>> implement WAND? If so, the performance will really be bad. >>>> >>>> regards, >>>> Lin >>>> >>>> >>>> On Sat, Mar 16, 2013 at 2:49 AM, lukai <lukai1...@gmail.com> wrote: >>>> >>>>> I had implemented wand with solr/lucene. So far there is no performance >>>>> issue. There is no native support for this functionality, you need to >>>>> implement it by yourself.. >>>>> >>>>> On Fri, Mar 15, 2013 at 10:09 AM, Lin Ma <lin...@gmail.com> wrote: >>>>> >>>>> > Hello guys, >>>>> > >>>>> > Supposing I have one million documents, and each document has >>>>> hundreds of >>>>> > features. For a given query, it also has hundreds of features. I >>>>> want to >>>>> > fetch most relevant top 1000 documents by dot product related >>>>> features of >>>>> > query and documents (query/document features are in the same feature >>>>> > space). >>>>> > >>>>> > I am not sure how Lucene implement internally? If we have to go >>>>> through all >>>>> > one million document to dot product the query, then I am concerning >>>>> about >>>>> > the performance. Appreciate if anyone could confirm (1) how Lucene >>>>> works >>>>> > internally for this use case (2) any smart ideas to make improvement >>>>> for >>>>> > query efficiency to select top 1000 documents? >>>>> > >>>>> > thanks in advance, >>>>> > Lin >>>>> > >>>>> >>>> >>>> >>> >> >