several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : - If I define 2 tokenizers in a fieldtype, only the first one is applied, the other is ignored. Is that because the 2nd

Re: several tokenizers in one field type

2008-06-24 Thread Ryan McKinley
On Jun 24, 2008, at 12:07 AM, Norberto Meijome wrote: hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : best docs are here:

Re: several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 00:14:57 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: best docs are here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters yes, I've been reading that already , thanks :) - If I define 2 tokenizers in a fieldtype, only the first one is applied, the

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:04:24 +0100 Dave Searle [EMAIL PROTECTED] wrote: At the moment I have an index of forum messages (each message being a separate doc). Results are displayed on a per message basis, however, I would like to group the results via their thread. Apart from using a facet on

Re: Accented search

2008-06-24 Thread Robert Haschart
climbingrose wrote: Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t =

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:34:44 +0100 Dave Searle [EMAIL PROTECTED] wrote: I am currently storing the thread id within the message index, however, although this would allow me to sort, it doesn't help with the grouping of threads based on relevancy. See the idea is to index message data in the

Re: SOLR-469 - bad patch?

2008-06-24 Thread Shalin Shekhar Mangar
I've just uploaded a new patch which applies cleanly on the trunk. Thanks! On Tue, Jun 24, 2008 at 7:35 PM, Jon Baer [EMAIL PROTECTED] wrote: It seems the new patch @ https://issues.apache.org/jira/browse/SOLR-469 is x2 the size but turns out the patch itself might be bad? Ie, it dumps

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread mike segv
I do want to import all documents. My understanding of the way things work, correct me if I'm wrong, is that there can be a certain number of documents included in a single atomic update. Instead of having all my 16 Million documents be part of a single update (that could more easily fail being

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread Shalin Shekhar Mangar
Ok, I got your point. DataImportHandler currently creates documents and adds them one-by-one to Solr. A commit/optimize is called once after all documents are finished. If a document fails to add due to any exception then the import fails. You can still achieve the functionality you want by

Nutch - Solr latest?

2008-06-24 Thread Jon Baer
Hi, Im curious, is there a spot / patch for the latest on Nutch / Solr integration, Ive found a few pages (a few outdated it seems), it would be nice (?) if it worked as a DataSource type to DataImportHandler, but not sure if that fits w/ how it works. Either way a nice contrib patch

Re: SpellCheckComponent: No file-based suggestions + Location issue

2008-06-24 Thread Ronald K. Braun
Shalin: The index directory location is being created inside the current working directory. We should change that. I've opened SOLR-604 and attached a patch which fixes this. I updated from nightly build to incorporate your fix and it works perfectly, now building the spell indexes in

Re: Wildcard search question

2008-06-24 Thread Jon Drukman
Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options?

Re: How to use SOLR1.2

2008-06-24 Thread Chris Hostetter
: I am new in SOLR 1.2, configured Admin GUI. Facing problem in using : this. could you pls help me out to configure the nex. the admin GUI isn't really a place where you configure Solr. It's a way to see the status of things -- configuration is done via config files. have you con through

Re: UnicodeNormalizationFilterFactory

2008-06-24 Thread Chris Hostetter
: I've seen mention of these filters: : : filter class=schema.UnicodeNormalizationFilterFactory/ : filter class=schema.DiacriticsFilterFactory/ Are you asking because you saw these in Robert Haschart's reply to your previous question? I think those are custom Filters that he has in his

Re: Can I specify the default operator at query time ?

2008-06-24 Thread Chris Hostetter
: Subject: Can I specify the default operator at query time ? : In-Reply-To: [EMAIL PROTECTED] http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh

DataImportHandler running out of memory

2008-06-24 Thread wojtekpia
I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc

Re: DataImportHandler running out of memory

2008-06-24 Thread Grant Ingersoll
This is a bug in MySQL. Try setting the Fetch Size the Statement on the connection to Integer.MIN_VALUE. See http://forums.mysql.com/read.php?39,137457 amongst a host of other discussions on the subject. Basically, it tries to load all the rows into memory, the only alternative is to set

How to debug ?

2008-06-24 Thread Norberto Meijome
hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for queries of length (in characters) of n+1..m (n works fine). analysis.jsp shows that it SHOULD match, but /select doesn't bring anything back.

Re: How to debug ?

2008-06-24 Thread Ryan McKinley
also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... On Jun 24, 2008, at 7:12 PM, Norberto Meijome wrote: hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with

Re: DataImportHandler running out of memory

2008-06-24 Thread Shalin Shekhar Mangar
Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index

Re: How to debug ?

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... right, I will look into that a bit more. I am actually using the lukeall.jar

RE: UnicodeNormalizationFilterFactory

2008-06-24 Thread Lance Norskog
ISOLatin1AccentFilterFactory works quite well for us. It solves our basic euro-text keyboard searching problem, where protege should find protégé. (protege with two accents.) -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2008 4:05 PM To:

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is batchSize=-1 not fetchSize. Or keep it to a very small value. --Noble On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED] wrote: DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would