hi all,
( I'm using 1.3 nightly build from 15th June 08.)
Is there some documentation about how analysers + tokenizers are applied in
fields ? In particular, my question :
- If I define 2 tokenizers in a fieldtype, only the first one is applied, the
other is ignored. Is that because the 2nd
On Jun 24, 2008, at 12:07 AM, Norberto Meijome wrote:
hi all,
( I'm using 1.3 nightly build from 15th June 08.)
Is there some documentation about how analysers + tokenizers are
applied in
fields ? In particular, my question :
best docs are here:
On Tue, 24 Jun 2008 00:14:57 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:
best docs are here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
yes, I've been reading that already , thanks :)
- If I define 2 tokenizers in a fieldtype, only the first one is
applied, the
On Tue, 24 Jun 2008 16:04:24 +0100
Dave Searle [EMAIL PROTECTED] wrote:
At the moment I have an index of forum messages (each message being a
separate doc). Results are displayed on a per message basis, however, I would
like to group the results via their thread. Apart from using a facet on
climbingrose wrote:
Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t =
On Tue, 24 Jun 2008 16:34:44 +0100
Dave Searle [EMAIL PROTECTED] wrote:
I am currently storing the thread id within the message index, however,
although this would allow me to sort, it doesn't help with the grouping of
threads based on relevancy. See the idea is to index message data in the
I've just uploaded a new patch which applies cleanly on the trunk. Thanks!
On Tue, Jun 24, 2008 at 7:35 PM, Jon Baer [EMAIL PROTECTED] wrote:
It seems the new patch @ https://issues.apache.org/jira/browse/SOLR-469 is
x2 the size but turns out the patch itself might be bad?
Ie, it dumps
I do want to import all documents. My understanding of the way things work,
correct me if I'm wrong, is that there can be a certain number of documents
included in a single atomic update. Instead of having all my 16 Million
documents be part of a single update (that could more easily fail being
Ok, I got your point.
DataImportHandler currently creates documents and adds them one-by-one to
Solr. A commit/optimize is called once after all documents are finished. If
a document fails to add due to any exception then the import fails.
You can still achieve the functionality you want by
Hi,
Im curious, is there a spot / patch for the latest on Nutch / Solr
integration, Ive found a few pages (a few outdated it seems), it would
be nice (?) if it worked as a DataSource type to DataImportHandler,
but not sure if that fits w/ how it works. Either way a nice contrib
patch
Shalin:
The index directory location is being created inside the current working
directory. We should change that. I've opened SOLR-604 and attached a patch
which fixes this.
I updated from nightly build to incorporate your fix and it works
perfectly, now building the spell indexes in
Norberto Meijome wrote:
ok well let's say that i can live without john/jon in the short term.
what i really need today is a case insensitive wildcard search with
literal matching (no fancy stemming. bobby is bobby, not bobbi.)
what are my options?
: I am new in SOLR 1.2, configured Admin GUI. Facing problem in using
: this. could you pls help me out to configure the nex.
the admin GUI isn't really a place where you configure Solr. It's a way
to see the status of things -- configuration is done via config files.
have you con through
: I've seen mention of these filters:
:
: filter class=schema.UnicodeNormalizationFilterFactory/
: filter class=schema.DiacriticsFilterFactory/
Are you asking because you saw these in Robert Haschart's reply to your
previous question? I think those are custom Filters that he has in his
: Subject: Can I specify the default operator at query time ?
: In-Reply-To: [EMAIL PROTECTED]
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh
I'm trying to load ~10 million records into Solr using the DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
soon as I try loading more than about 5 million records.
Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc
This is a bug in MySQL. Try setting the Fetch Size the Statement on
the connection to Integer.MIN_VALUE.
See http://forums.mysql.com/read.php?39,137457 amongst a host of other
discussions on the subject. Basically, it tries to load all the rows
into memory, the only alternative is to set
hi,
I'm trying to understand why a search on a field tokenized with the nGram
tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for
queries of length (in characters) of n+1..m (n works fine).
analysis.jsp shows that it SHOULD match, but /select doesn't bring anything
back.
also, check the LukeRequestHandler
if there is a document you think *should* match, you can see what
tokens it has actually indexed...
On Jun 24, 2008, at 7:12 PM, Norberto Meijome wrote:
hi,
I'm trying to understand why a search on a field tokenized with the
nGram
tokenizer, with
Setting the batchSize to 1 would mean that the Jdbc driver will keep
1 rows in memory *for each entity* which uses that data source (if
correctly implemented by the driver). Not sure how well the Sql Server
driver implements this. Also keep in mind that Solr also needs memory to
index
On Tue, 24 Jun 2008 19:17:58 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:
also, check the LukeRequestHandler
if there is a document you think *should* match, you can see what
tokens it has actually indexed...
right, I will look into that a bit more.
I am actually using the lukeall.jar
ISOLatin1AccentFilterFactory works quite well for us. It solves our basic
euro-text keyboard searching problem, where protege should find protégé.
(protege with two accents.)
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 24, 2008 4:05 PM
To:
DIH streams rows one by one.
set the fetchSize=-1 this might help. It may make the indexing a bit
slower but memory consumption would be low.
The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM
--Noble
On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
[EMAIL
it is batchSize=-1 not fetchSize. Or keep it to a very small value.
--Noble
On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള് नोब्ळ्
[EMAIL PROTECTED] wrote:
DIH streams rows one by one.
set the fetchSize=-1 this might help. It may make the indexing a bit
slower but memory consumption would
24 matches
Mail list logo