Re: product based term combination for BooleanQuery?

2007-07-03 Thread Jason Pump
You're not using any type of phrase search. Try - ( (title:John Bush^4.0) OR (body:John Bush) ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) ) or maybe ( (title:John Bush~4^4.0) OR (body:John Bush~4) ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) )

Re: Language detection library

2007-05-03 Thread Jason Pump
: [EMAIL PROTECTED] -- Jason Pump Technical Architect Healthline 660 Third Street, Ste. 100 San Francisco, CA 94107 direct dial 415.281.3133 cell 510.812.1784 www.healthline.com 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

OT re Emulating Pages Search

2007-04-03 Thread Jason Pump
If the documents have some sort of fixed ranking value (pageweight) and the documents are arranged in the index in that order then at some point you can say there is no reason to look for more matches, e.g. even if the words were next to each other in query order, the document couldn't

Re: Index a source, but not store it... can it be done?

2007-03-09 Thread Jason Pump
you're ever going to do is to protect the index as well as you do the original documents. jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Jason Pump Technical Architect

Re: Index a source, but not store it... can it be done?

2007-03-08 Thread Jason Pump
If you store a hash code of the word rather then the actual word you should be able to search for stuff but not be able to actually retrieve it; you can trade precision for security based on the number of bits in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be a

Re: Text storing design and performance question

2007-01-11 Thread Jason Pump
of the document, banana and orange at the end. Wouldn't your optimization stop at the word apple and just return this word highlighted? Or do you know of a way to quantify the match? -Original Message- From: Jason Pump [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 1:49 PM To: java

Re: Text storing design and performance question

2007-01-10 Thread Jason Pump
Renaud, one optimization you can do on this is to try the first 10kb, see if it finds text worth highlighting, if not, with a slight overlap try the next 9.9kb - 19.9kb or just 9.9kb - end if you're feeling lazy. This assumes that most good matches are at the start of the document, and that

Re: word frequency list?

2006-08-31 Thread Jason Pump
are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0. Cheers, Boris On 8/30/06, Jason Pump [EMAIL PROTECTED] wrote: Is there a large list of words and their frequency in the english language? Obviously

word frequency list?

2006-08-30 Thread Jason Pump
Is there a large list of words and their frequency in the english language? Obviously it would differ by corpus but I would like to see what's already available. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: search with RangeFilter.Less

2006-06-28 Thread Jason Pump
It's a string comparison. Make the 5 a 05 would be a simple workaround. Jason Peter W. wrote: Hello, I'm trying to do a numerical search for a property in Lucene using RangeFilter.Less without using both RangeQuery and test cases. Here's the code that I expect would return one hit :