Re: Query String for a phrase?

2007-03-16 Thread ruchi thakur
Thanks a lot for your help. I am now using query as documented for phrase. Regards, Ruchi On 3/13/07, Chris Hostetter [EMAIL PROTECTED] wrote: : ok, so does that mean i can use both q1 and q2 for phrase query ie; for : searching words adjacent to each other. Actually that was my only concern,

Re: How to customize scoring using user feedback?

2007-03-16 Thread daniel rosher
Hi Xiong, You're ranking idea sounds interesting ... are you looking into something akin to the TrafficRank algorithm ? This is moving into the realm of Personalized search or Personalised search, something I'm not aware of appearing on the Lucene mailing lists so far, but something I'm quite

Re: How to customize scoring using user feedback?

2007-03-16 Thread xiong
daniel rosher daniel.rosher at hotonline.com writes: We regularly open a new IndexReader, and before this reader replaces the production one, we determine f(D) for all documents so that for the user there is almost no performance issue,i.e. f(D) is cached. I suspect you can implement

How the Field.Store flag works?

2007-03-16 Thread cybercouf
I'm using Lucene for indexing my nutch crawls. But I don't really understand the difference for this flag Field.Store.YES or NO. It seems (using luke) I still can read some data who were not 'store.YES'. Where are store this data if it's not in the index? what is better to use for small fields?

character like ,+,.. getting ignored in search

2007-03-16 Thread ruchi thakur
Hi all, I am using StopAnalzer for indexing and searching. Am searching for phrases. q1 - a b this query gives me all documents conatining a b , but also gives documents conatining a b again q2 - a b this query q2 gives documents conatining a b, but also gives documents conatining a b How

How to use StandardAnalyzer

2007-03-16 Thread sandeep.chawla
Hi, I am new to Lucene Java API. I want to use StandardAnalyzer for tokenizing my document. How can I use it? Further how can I index Acronym and Company name as one term. I know , we can do this using StandardAnalyzer but I am not sure of the way. Thanks in advance Sandeep

Re: How to use StandardAnalyzer

2007-03-16 Thread James liu
You can read demo source code from lucene source package. 2007/3/16, [EMAIL PROTECTED] [EMAIL PROTECTED]: Hi, I am new to Lucene Java API. I want to use StandardAnalyzer for tokenizing my document. How can I use it? Further how can I index Acronym and Company name as one term. I

Re: How the Field.Store flag works?

2007-03-16 Thread Erick Erickson
This confused me at first too, so here's my current understanding... When you use YES, you store the actual data as-is with the document. This is entirely independent of indexing. Internally, I assume that searching and storing are separate parts of the index that have nothing to do with each

Re: character like ,+,.. getting ignored in search

2007-03-16 Thread Erick Erickson
What analyzers are you using at index and search time? I suspect that the '' is being removed both at index and search. So, you've only indexed the tokens 'a' and 'b' and by the time you get out of the query parser, you're only searching for terms 'a' 'b'. Did you bother using query.toString()

Re: How to use StandardAnalyzer

2007-03-16 Thread Erick Erickson
Also, See SynonymAnalyzer in Lucene In Action. Erick On 3/16/07, James liu [EMAIL PROTECTED] wrote: You can read demo source code from lucene source package. 2007/3/16, [EMAIL PROTECTED] [EMAIL PROTECTED]: Hi, I am new to Lucene Java API. I want to use StandardAnalyzer for

Re: character like ,+,.. getting ignored in search

2007-03-16 Thread ruchi thakur
Thanks Eric. I will try out the suggestions. I am using StopAnalyzer. Regards, Ruchi On 3/16/07, Erick Erickson [EMAIL PROTECTED] wrote: What analyzers are you using at index and search time? I suspect that the '' is being removed both at index and search. So, you've only indexed the tokens

Re: Indexing HTML pages and phrases

2007-03-16 Thread Doron Cohen
For search phrases there's no need to detect the phrases at indexing time - the position of each word is saved in the index and then used at search time to match phrase queries. (also see 'query syntax document'.) Lucene takes plain text as document input - extraction of content text and

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-03-16 Thread Daniel Rosher
Hi Peter, Shouldn't the search perform the euclidean distance during filtering as well though, otherwise you will obtain perhaps highly relevant hits reported to the user outside the range they specified? Particularly as the search radius gets larger. Cheers, Dan On 1/28/07, Peter Keegan

Re: Open / Close when Merging

2007-03-16 Thread Doron Cohen
Hi Matt, To verify I understand correctly, are this your settings? : - one MAIN index containing all the data: used for search; never does addDocument(); - Several side INC indexes: addDocument() here for new/modified documents; never searched; - at some point all INC indexes are merged

Re: Fast index traversal and update for stored field?

2007-03-16 Thread Chris Hostetter
: Sounds like there's nothing out of the box to solve my problem; if : I write something to update lucene indexes in place I'll follow up : about it in here (don't know that I will though; building a new, : narrower index is probably more expedient and will probably be fast : enough for my

Re: search timeout

2007-03-16 Thread Chris Hostetter
: Nutch recently added a search query timeout (NUTCH-308). Are there any : plans to add such functionality to the Lucene HitCollector directly? Or : is there some reason that this is a bad idea? Quickly skimming the patch in that Issue, Nutch seems to have done what has been discussed

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-03-16 Thread Peter Keegan
Dan, The filtering is done in the HitCollector by the bounding box, so the only hits that get collected are those that match the keywords, the bounding box, and some Lucene filters (BitSets) (I'm probably overloading the word 'filter' a bit). So, the only hits from the collector that need to be

Re: Fast index traversal and update for stored field?

2007-03-16 Thread Erick Erickson
Yet another idea just occurred. Remember that documents in Lucene do not all have to have the same field. So what if you had a *very special document* in your index that contained only the changing info? Perhaps in XML or even binary format? Then, updating your index would only involve deleting

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-03-16 Thread Peter Keegan
Note: this is a reply to a posting to java-dev --Peter Eric, Now that it is live, is performance pretty good? Performance is outstanding. Each server can easily handle well over 100 qps on an index of over 800K documents. There are several servers (4 dual core (8 CPU) Opteron) supporting

Issue while parsing XML files due to control characters, help appreciated.

2007-03-16 Thread Lokeya
Hi, I am trying to index the content from XML files which are basically the metadata collected from a website which have a huge collection of documents. This metadata xml has control characters which causes errors while trying to parse using the DOM parser. I tried to use encoding = UTF-8 but