date:20080624

several tokenizers in one field type

2008-06-24 Thread Norberto Meijome

hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : - If I define 2 tokenizers in a fieldtype, only the first one is applied, the other is ignored. Is that because the 2nd

Re: several tokenizers in one field type

2008-06-24 Thread Ryan McKinley

On Jun 24, 2008, at 12:07 AM, Norberto Meijome wrote: hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : best docs are here:

Re: several tokenizers in one field type

2008-06-24 Thread Norberto Meijome

On Tue, 24 Jun 2008 00:14:57 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: best docs are here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters yes, I've been reading that already , thanks :) - If I define 2 tokenizers in a fieldtype, only the first one is applied, the

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome

On Tue, 24 Jun 2008 16:04:24 +0100 Dave Searle [EMAIL PROTECTED] wrote: At the moment I have an index of forum messages (each message being a separate doc). Results are displayed on a per message basis, however, I would like to group the results via their thread. Apart from using a facet on

Re: Accented search

2008-06-24 Thread Robert Haschart

climbingrose wrote: Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t =

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome

On Tue, 24 Jun 2008 16:34:44 +0100 Dave Searle [EMAIL PROTECTED] wrote: I am currently storing the thread id within the message index, however, although this would allow me to sort, it doesn't help with the grouping of threads based on relevancy. See the idea is to index message data in the

Re: SOLR-469 - bad patch?

2008-06-24 Thread Shalin Shekhar Mangar

I've just uploaded a new patch which applies cleanly on the trunk. Thanks! On Tue, Jun 24, 2008 at 7:35 PM, Jon Baer [EMAIL PROTECTED] wrote: It seems the new patch @ https://issues.apache.org/jira/browse/SOLR-469 is x2 the size but turns out the patch itself might be bad? Ie, it dumps

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread mike segv

I do want to import all documents. My understanding of the way things work, correct me if I'm wrong, is that there can be a certain number of documents included in a single atomic update. Instead of having all my 16 Million documents be part of a single update (that could more easily fail being

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread Shalin Shekhar Mangar

Ok, I got your point. DataImportHandler currently creates documents and adds them one-by-one to Solr. A commit/optimize is called once after all documents are finished. If a document fails to add due to any exception then the import fails. You can still achieve the functionality you want by

Nutch - Solr latest?

2008-06-24 Thread Jon Baer

Hi, Im curious, is there a spot / patch for the latest on Nutch / Solr integration, Ive found a few pages (a few outdated it seems), it would be nice (?) if it worked as a DataSource type to DataImportHandler, but not sure if that fits w/ how it works. Either way a nice contrib patch

Re: SpellCheckComponent: No file-based suggestions + Location issue

2008-06-24 Thread Ronald K. Braun

Shalin: The index directory location is being created inside the current working directory. We should change that. I've opened SOLR-604 and attached a patch which fixes this. I updated from nightly build to incorporate your fix and it works perfectly, now building the spell indexes in

Re: Wildcard search question

2008-06-24 Thread Jon Drukman

Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options?

Re: How to use SOLR1.2

2008-06-24 Thread Chris Hostetter

: I am new in SOLR 1.2, configured Admin GUI. Facing problem in using : this. could you pls help me out to configure the nex. the admin GUI isn't really a place where you configure Solr. It's a way to see the status of things -- configuration is done via config files. have you con through

Re: UnicodeNormalizationFilterFactory

2008-06-24 Thread Chris Hostetter

: I've seen mention of these filters: : : filter class=schema.UnicodeNormalizationFilterFactory/ : filter class=schema.DiacriticsFilterFactory/ Are you asking because you saw these in Robert Haschart's reply to your previous question? I think those are custom Filters that he has in his

Re: Can I specify the default operator at query time ?

2008-06-24 Thread Chris Hostetter

: Subject: Can I specify the default operator at query time ? : In-Reply-To: [EMAIL PROTECTED] http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh

DataImportHandler running out of memory

2008-06-24 Thread wojtekpia

I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc

Re: DataImportHandler running out of memory

2008-06-24 Thread Grant Ingersoll

This is a bug in MySQL. Try setting the Fetch Size the Statement on the connection to Integer.MIN_VALUE. See http://forums.mysql.com/read.php?39,137457 amongst a host of other discussions on the subject. Basically, it tries to load all the rows into memory, the only alternative is to set

How to debug ?

2008-06-24 Thread Norberto Meijome

hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for queries of length (in characters) of n+1..m (n works fine). analysis.jsp shows that it SHOULD match, but /select doesn't bring anything back.

Re: How to debug ?

2008-06-24 Thread Ryan McKinley

also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... On Jun 24, 2008, at 7:12 PM, Norberto Meijome wrote: hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with

Re: DataImportHandler running out of memory

2008-06-24 Thread Shalin Shekhar Mangar

Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index

Re: How to debug ?

2008-06-24 Thread Norberto Meijome

On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... right, I will look into that a bit more. I am actually using the lukeall.jar

RE: UnicodeNormalizationFilterFactory

2008-06-24 Thread Lance Norskog

ISOLatin1AccentFilterFactory works quite well for us. It solves our basic euro-text keyboard searching problem, where protege should find protégé. (protege with two accents.) -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2008 4:05 PM To:

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

it is batchSize=-1 not fetchSize. Or keep it to a very small value. --Noble On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED] wrote: DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would

several tokenizers in one field type

Re: several tokenizers in one field type

Re: several tokenizers in one field type

Re: SOLR-139 (Support updateable/modifiable documents)

Re: Accented search

Re: SOLR-139 (Support updateable/modifiable documents)

Re: SOLR-469 - bad patch?

Re: Attempting dataimport using FileListEntityProcessor

Re: Attempting dataimport using FileListEntityProcessor

Nutch - Solr latest?

Re: SpellCheckComponent: No file-based suggestions + Location issue

Re: Wildcard search question

Re: How to use SOLR1.2

Re: UnicodeNormalizationFilterFactory

Re: Can I specify the default operator at query time ?

DataImportHandler running out of memory

Re: DataImportHandler running out of memory

How to debug ?

Re: How to debug ?

Re: DataImportHandler running out of memory

Re: How to debug ?

RE: UnicodeNormalizationFilterFactory

Re: DataImportHandler running out of memory

Re: DataImportHandler running out of memory

24 matches

Site Navigation

Mail list logo

Footer information