It will mostly likely be smaller but the new size is highly dependent on
the number of documents that you have deleted (because optimize actually
removes data instead of only flagging it).
-Original Message-
From: Karthik K [mailto:karthikkato...@gmail.com]
Sent: dinsdag 13 juli 2010
Hi Lukas,
Have you tried setting the debug mode (debugQuery=on)?
It provides very detailed info about the scoring, it might even be too
much for a regular user but for us it was very helpful at times.
Regards,
Tom
-Original Message-
From: Lukas Kahwe Smith [mailto:m...@pooteeweet.org
QParser.
Time to start reading and experimenting I guess.
Regards,
Tom
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: maandag 14 juni 2010 22:29
To: solr-user@lucene.apache.org
Subject: Re: custom scorer in Solr
: Problem is that they want scores that make
and suggestions on this because this is the issue that will
make or break the acceptance of Solr for this client.
Thanks,
Tom
-Original Message-
From: Fornoville, Tom
Sent: woensdag 9 juni 2010 15:35
To: solr-user@lucene.apache.org
Subject: custom scorer in Solr
Hi all,
We
have potentially multiple scores for instance:
4
and 2 if your doc matches exact and partial on category.
- use the max functionquery to only return the highest score -- 4 in
this
case.
You have to find out for yourself if this performs though.
Hope that helps,
Geert-Jan
2010/6/14 Fornoville, Tom
get 4.
I assume for this to work we would need to develop a custom Scorer class
but we have no clue on how to incorporate this in Solr.
Maybe there is even a simpler solution that we don't know about.
All suggestions welcome!
Thanks,
Tom
/contrib.. Have
other Solr users used Lucene Benchmark? Can anyone provide any hints for
adapting it to Solr? (Are there any common gotchas etc?).
Tom
Tom Burton-West
University of Michigan Libraries
http://www.hathitrust.org/blogs/large-scale-search
for facets and paging.
My apologies for the long mail but I would greatly appreciate feedback
and/or suggestions.
I'm aware that this that this is a very particular problem but
everything that points me in the right direction is helpful.
Cheers,
Tom
Thanks Koji,
That was the information I was looking for. I'll be sure to post the test
results to the list. It may be a few weeks before we can schedule the tests
for our test server.
Tom
I've never tried it but NoMergePolicy and NoMergeScheduler
can be specified in solrconfig.xml
and writes.
Tom Burton-West
of management as we scale out. One of the reasons was to reduce contention
between indexing/optimizing and search instances for disk I/O. See
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond
for details.
Tom
-Original
specific pieces?
All feedback and suggestions are welcome.
Thanks in advance,
Tom
with debugQuery=on for the user's failing query?
Assuming I did this right, I'd next look for is a copyField. Is the user's
query really being executed against this field?
Schema.xml could be useful, too.
Tom
On Tue, Apr 20, 2010 at 10:19 AM, Charlie Jackson
charlie.jack...@cision.com wrote:
I've got
(http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations) mentions a
limit of about 2 billion document ids. I assume this is the lucene internal
document id and would therefore be a per index/per shard limit. Is this
correct?
Tom Burton-West.
cases, “freeze the world” pauses of a
minute or more. As a practical matter, this can become a serious problem for
heap sizes that exceed about two gigabytes, even if far more physical memory is
available.”
http://www.lucidimagination.com/search/document/CDRG_ch08_8.4.1?q=memory%20caching
Tom
. You
also might want to take a look at the free memory when you start up Solr and
then watch as it fills up as you get more queries (or send cache-warming
queries).
Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
KaktuChakarabati wrote:
My question was mainly about
You need a query string with the standard request handler. (dismax has
q.alt)
Try q=*:*, if you are trying to get facets for all documents.
And yes, a friendlier error message would be a good thing.
Tom
On Mon, Mar 15, 2010 at 9:03 AM, David Rühr d...@marketing-factory.de wrote:
Hi List
have enough memory, that's just going to move the problem.
You can watch memory usage with jconsole, or get more detail with something
like yourkit.
Tom
On Fri, Mar 12, 2010 at 10:17 AM, Oleg Burlaca o...@burlaca.com wrote:
Hello,
I've searched the list for this kind of error but never find one
and then I'll post a question to either the Solr or Lucene list. Can
you suggest which list I should post an index pruning question to?
Tom
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Tuesday, March 09, 2010 2:36 PM
To: solr-user@lucene.apache.org
Subject: Re
would have to
watch out for any Russian-Chinese dictionaries:)
Tom
There wasn't any completely satisfactory solution; there were a large
number
of two and three letter n-grams so we were able to use a dictionary
approach
to eliminate those (names tend to be longer). We also looked
Interesting. I wonder though if we have 4 million English documents and 250
in Urdu, if the Urdu words would score badly when compared to ngram
statistics for the entire corpus.
hossman wrote:
Since you are dealing with multiple langugaes, and multiple varient usages
of langauges
mentioned above.
I'm not sure I understand your suggestion. Since real word hapax legomenons
are generally pretty common (maybe 40-60% of unique words) wouldn't using
them as the no set provide mixed signals to the classifier?
Tom
Walter Underwood-2 wrote:
Hmm, how about a classifier? Common
and the inclusion of proper names, place names, and technical terms.
We are considering using some heuristics, such as looking for strings over a
certain length or strings containing more than some number of punctuation
characters.
This paper has a few such heuristics:
Kazem Taghva, Tom Nartker
Hi Mani,
Mani EZZAT wrote:
I'm dynamically creating cores with a new index, using the same schema
and solrconfig.xml
Does the problem occur if you use the same configuration in a single, static
core?
Tom
--
View this message in context:
http://old.nabble.com/Re%3A-Warning-%3A-no-lockType
of the ramBuffer and how
much heap you need to give the JVM, or is there some reasonable method of
finding this out by experimentation?
We would rather not find out by decreasing the amount of memory allocated to
the JVM until we get an OOM.
Tom
I've run Lucene with heap sizes as large as 28GB
Thanks Otis,
I don't know enough about Hadoop to understand the advantage of using Hadoop
in this use case. How would using Hadoop differ from distributing the
indexing over 10 shards on 10 machines with Solr?
Tom
Otis Gospodnetic wrote:
Hi Tom,
32MB is very low, 320MB is medium
to 3200MB? What are people's experiences with very large
ramBufferSizeMB sizes?
Tom Burton-West
University of Michigan Library
www.hathitrust.org
/large-scale-search )
Tom
Hi Tom,
1600 warming queries, that's quite many. Do you run them every time a
document is added to the index? Do you have any tips on warming?
If the index size is more than you can have in RAM, do you recommend
to split the index to several servers so it can all
overview of the issues is the paper
by Baeza-Yates ( http://doi.acm.org/10.1145/1277741.125 The Impact of
Caching on Search Engines )
Tom Burton-West
Digital Library Production Service
University of Michigan Library
--
View this message in context:
http://old.nabble.com/persistent-cache
in the index by 2 or 3 times.
Tom
---
Solr Specification Version: 1.3.0.2009.09.03.11.14.39
Solr Implementation Version: 1.4-dev 793569 - root - 2009-09-03 11:14:39
Lucene Specification Version: 2.9-dev
Thanks Michael,
I'm not sure I understand. CheckIndex reported a negative number:
-16777214.
But in any case we can certainly try running CheckIndex from a patched
lucene We could also run a patched lucene on our dev server.
Tom
Yes, the term count reported by CheckIndex is the total
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)
Any suggestions for troubleshooting would be appreciated.
Trace from tomcat logs appended below.
Tom Burton-West
---
Feb 5, 2010 8
+1
And thanks to you both for all your work on CommonGrams!
Tom Burton-West
Jason Rutherglen-2 wrote:
Robert, thanks for redoing all the Solr analyzers to the new API! It
helps to have many examples to work from, best practices so to speak.
--
View this message in context:
http
amount of memory.
Tom
On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian anith...@gmail.com wrote:
It sounds to me that multiple cores won't scale.. wouldn't you have to
create multiple configurations per each core and does the ranking function
change per user?
I would imagine that the filter method
I recommend getting familiar with the analysis tool included with solr. From
Solr's main admin screen, click on analysis, Check verbose, and enter your
text, and you can see the changes that happen during analysis.
It's really helpful, especially when getting started.
Tom
On Wed, Jan 27, 2010
://www.hathitrust.org/large_scale_search and our blog:
http://www.hathitrust.org/blogs/large-scale-search
http://www.hathitrust.org/blogs/large-scale-search (I'll be updating the
blog with details of current hardware and performance tests in the next week
or so)
Tom
Tom Burton-West
Digital Library
index, so optimizing the slave should be a no-op. Check to see
what files you have on the slave after snappulling.
Tom
Did you rebuild the index? Changing the analyzer for the index doesn't
affect already indexed documents.
Tom
On Tue, Dec 8, 2009 at 11:57 AM, insaneyogi3008 insaney...@gmail.comwrote:
Hello,
I tried to force case insensitive search by having the following setting in
my schema.xml file
Hi -
That's a common one to get bit by. The string
On Mon, Dec 7, 2009 at 7:44 PM, regany re...@newzealand.co.nz wrote:
hi all - newbie solr question - I've indexed some documents and can search
/
receive results using the following schema - BUT ONLY when searching on the
id field. If I
not the whole field.
The ID field probably works because it has one term in it. 1 finds 1
just fine.
Try solr.TextField instead.
Tom
On Mon, Dec 7, 2009 at 7:47 PM, Tom Hill solr-l...@worldware.com wrote:
Hi -
That's a common one to get bit by. The string
On Mon, Dec 7, 2009 at 7:44 PM, regany re
http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_by_ID_and_by_Query
On Thu, Dec 3, 2009 at 11:57 AM, Joel Nylund jnyl...@yahoo.com wrote:
Is there a url based approach to delete a document?
thanks
Joel
to the Analyzer, so if a person searches for the words **sea biscit** the
analyzer will be given the words sea and biscit seperately, and will not
know that they match a synonym.*
...
Tom
On Tue, Nov 24, 2009 at 10:47 AM, brad anderson solrinter...@gmail.comwrote:
Hi Folks,
I
In this introductory technical presentation, renowned search expert Mark
Bennett, CTO of Search Consultancy New Idea Engineering,
will present practical tips and examples to help you quickly get productive
with Solr, including:
* Working with the web command line and controlling your inputs and
for
problems. How about starting with 1000 docs, and five minutes for maxTime
(5*60*1000) or about 3 laks of milliseconds.
That should help performance a lot. Try that, and see how it works.
Tom
On Mon, Nov 16, 2009 at 2:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:
I think it would be useful for members
acquisition and more) . Download (reg required) at
http://www.lucidimagination.com/whitepaper/whats-new-in-solr-1-4?sc=AP
Tom
www.lucidimagination.com
query from the Admin tool interface in Solr and then
in Lucene to see if the query is being parsed or otherwise interpreted
differently.
Tom
DHast wrote:
Hello,
I have recently installed Solr as an alternative to our home made lucene
search servers, and while in most respects the performance
/SOLR-1145,
but did not see a unit test that I might try to run in our system.
Do others have this logging working successfully ?
Is there something else that needs to be set up?
Tom
Hello all,
When I start up Solr from the example directory using start.jar, it seems to
start up, but when I go to the localhost admin url
(http://localhost:8983/solr/admin) I get a 404 (See message appended below).
Has the url for the Solr admin changed?
Tom
Tom Burton-West
index size. We plan to distribute the index across 5
machines.
More information on our setup and results is available
at:http://www.hathitrust.org/blogs/large-scale-search
Tom
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days
(100*30 ~ 3 TB
://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4
Thoughts? Thanks...Tom
listing aggregator. But I can see that they must have
solved some of the problems I am thinking of such as scheduled indexing
of remote resources, and writing a parser to get data fields from some
other sites templates.
Any advice would be welcome...
Many Thanks,
Tom
catenateWords=0
catenateNumbers=0
catenateAll=0
/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldtype
Tom
IndexWriter.setTermIndexInterval?
Tom
Tom Burton-West
Digital Library Production Services
University of Michigan Library
?
Tom
i haven't used Nutch's implementation, but used the current implementation
(1.3) of ngrams and shingles to address exactly the same issue ( database of
music albums and tracks).
We didn't notice any severe performance hit but :
- data set isn't huge ( ca 1 MM docs).
- reindexed nightly via
Thanks Yonik,
-The next nightly build (Dec-01-2008) should have the changes.
The latest nightly build seems to be 30-Nov-2008 08:20,
http://people.apache.org/builds/lucene/solr/nightly/
has the version with the NIO fix been built? Are we looking in the
wrong place?
Tom
Tom Burton-West
distributor/response
aggregator?
Tom
Tom Burton-West
Information Retrieval Programmer
Digital Library Production Services
University of Michigan
terms are
still indexed too, with n-grams overlaid.
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C
ommonGrams.html
Tom
Tom Burton-West
Information Retrieval Programmer
Digital Library Production Services
University of Michigan Library
the documentation for the .prx file states that Positions entries are
ordered by increasing document number (the document number is implicit
from the .frq file)
Tom
help. What confuses me is why
multiple searchers are locking the prx index file. I would think that
searching is a read-only operation.
Perhaps we need to change something to tell Solr we aren't updating the
index?
Tom
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL
queries
containing common terms since our index is very large and we suspect one
or more very large segments of the position index need to be read into
memory.
Can someone point us to either the possible cause of this problem or
what we might change to reduce/eliminate it?
Tom
Tom Burton-West
. This is pretty ugly and could lead to some race conditions but will
probably work.
Any better recommendations on how to acheive this functionality?
Thanks...Tom
behaviour, what is the approved way of avoiding this
issue? Should we index on a per-page basis rather than trying to do 130
pages as a single document?
thanks in advance,
Tom.
--
Tom Lord | ([EMAIL PROTECTED])
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre
enough go home and start
playing with Solr.
Tom
--
View this message in context:
http://www.nabble.com/Talk-on-Solr---Oakland%2C-CA-June-18%2C-2008-tp17880636p17880636.html
Sent from the Solr - User mailing list archive at Nabble.com.
: 1.2.2008.06.02.15.21.48
Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
15:21:48
Is this feature now broken or does it look like my config is wrong?
Thanks...Tom
Hi,
Thanks Yonik. That fixed that. I would be useful to change one of the
existing dismax query types in the default solrconfig.xml to use this new
syntax (Especially since DisMaxRequestHandler is being deprecared.)
Thanks again...Tom
On Wed, Jun 4, 2008 at 11:19 AM, Yonik Seeley [EMAIL
for the seeding query. Has this feature gotten any
attention?
Thanks...Tom
I certainly have seen memory problems when I just drop a new war file in
place. So now I usually stop tomcat and restart.
I used to see problems (pre-1.0) when I just redeployed repeatedly, without
even accessing the app, but I've got a little script running in the
background that has done that
Hi -
We're definitely not seeing that. What do your logs show? What do your
schema/solrconfig look like?
Tom
On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
Hi All.
I'm seeing all these threads about availability and I'm
wondering why my situation is so different than others'.
We're
Nice!
And there seem to be some improvements. For example, Gamers and Gamera
no longer stem to the same word :-)
Tom
On 10/2/07, Walter Underwood [EMAIL PROTECTED] wrote:
Here at Netflix, we switched over our site search to Solr two weeks ago.
We've seen zero problems with the server. We
this
would need to do?
Is anyone else working on this?
Tom
On 9/18/07, Jon Pierce [EMAIL PROTECTED] wrote:
I see Yonik recently opened an issue in JIRA to track the addition of
pluggable functions (https://issues.apache.org/jira/browse/SOLR-356).
Any chance this will be implemented soon? It would
Hi Marc,
Are you using the same stemmer on your queries that you use when indexing?
Try the analysis function in the admin UI, to see how things are stemmed for
indexing vs. querying. If they don't match for really and fünny, and do
match for kraßen, then that's your problem.
Tom
On 9/14/07
. :-)
Tom
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms
? Or is
this something I have to actively configure?
And is approach 2 better than 1? Or vice versa? Or is the answer it
depends? :-)
If, as I suspect, the answer was it depends, are there any general
guidelines on when to use or approach or the other?
Thanks,
Tom
On 9/6/07, Mike Klaas [EMAIL
=2 disableUploadTimeout=true
URIEncoding=UTF-8 /
use the analysis page of the admin interface to check to see what's
happening to your queries, too.
http://localhost:8080/solr/admin/analysis.jsp?highlight=on (your port # may
vary)
Tom
On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote:
Hi
I don't use the java client, but when I switched to 1.2, I'd get that
message when I forget to add the content type header, as described in
CHANGES.txt
9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using
the new request dispatcher (SOLR-104). This requires posted
multiple values, I add them as separate occurrences of the field I
am faceting on.
If you still need them all in one field for other reasons, use copyField to
assemble them.
Tom
On 8/30/07, Giri [EMAIL PROTECTED] wrote:
Hi,
I am trying to get the facet values from a field that contains multiple
). And then to query only that index space, add,
for example
fq=space:product
to your query URL. Assuming you named the field 'space', and wanted the
'product' space.
There's a related example in the example solrconfig, look at
requestHandler name=partitioned class=solr.DisMaxRequestHandler
Tom
Go here
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28ste
mming%29#head-88cc86e4432b359030cffdb32d095062b843d4f5
Look for this
solr.PorterStemFilterFactory
On 8/13/07 1:50 PM, David Whalen [EMAIL PROTECTED] wrote:
Thanks, guys. I'm sure that by the time I get
?
Thanks,
Tom
Diffing vs. my local copy of 1.2, but it appears to be the same in HEAD.
--- src/java/org/apache/solr/analysis/SynonymFilterFactory.java
+++ src/java/org/apache/solr/analysis/SynonymFilterFactory.java (working
copy)
@@ -37,6 +37,7 @@
ignoreCase = getBoolean(ignoreCase,false
, or is that
just how it is?
Tom
Hi -
Perhaps I'm missing something obvious, but it there a way to get values from
the user's request as arguments to boost functions in dismax?
I'm thinking about distance based weighting for search results, which
requires the user's x,y.
Tom
replicate (at least not on the master).
I was wondering if it make sense to replicate to a slave instance, optimize
it there, and then distribute the optimized index from the first level
slave?
Any thoughts?
Thanks,
Tom
Hi -
What happens if updates occur during the optimize?
Thanks,
Tom
unfounded?
I would expect this to be true.
Also, each of the
snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.
That's my understanding.
Tom
SolrIndexSearcher, and replacing
FieldSortedHitQueue, and ScorePriorityQueue with versions that remove dups
based in a particular field.
Tom
On 5/2/07, Matthew Runo [EMAIL PROTECTED] wrote:
Hello!
I was wondering - is it possible to search and group the results by a
given field?
For example, I have
a
similar amount of work. I don't know how well *:* performs, but if it is
similar to state:[* TO *], I would benchmark it before using.
For us, facet queries are a high percentage, so the time was critical. It
might even be worth adding a field, if you don't already have an appropriate
one.
Tom
Hi Yonik,
Thanks for the quick response.
At 07:45 AM 12/28/2006, you wrote:
On 12/27/06, Tom [EMAIL PROTECTED] wrote:
I'm having a problem getting boosts to work the way I think they are
supposed to.
Do you have a specific relevance problem you are trying to solve, or
just testing things
At 12:03 PM 12/28/2006, you wrote:
On 12/28/06, Tom [EMAIL PROTECTED] wrote:
Could you index your documents in the desired order? This is the
default sort order.
I don't think I can control document order, as documents may get
edited after creation.
If not, you can add a field
JConsole also welcome).
Thanks,
Tom
,
Tom
it be faster to do that on a field with
fewer values (but the same number of documents)
e.g. type:[* TO *] where the type field has a small number of values.
Or does that not matter?
Thanks,
Tom
At 02:49 PM 11/21/2006, you wrote:
: I mean is there something I can put in a solr URL
using the results generated by the value of trim
on the initial query.
Any suggestions for where to go poking around to fix this vs. caching?
Thanks,
Tom
At 11:10 AM 11/8/2006, you wrote:
On 11/8/06, Tom [EMAIL PROTECTED] wrote:
On 10/30/06, Yonik Seeley [EMAIL PROTECTED] wrote:
Yes
At 01:35 PM 11/15/2006, you wrote:
On 11/15/06, Tom [EMAIL PROTECTED] wrote:
It looks like that for trimming, the places I want to modify are in
ScorePriorityQueue and FieldSortedHitQueue. When trimming, I want to
return the top item in the group that matches, whether by score or
sort, not just
to configure/code this in Solr.
Thanks,
Tom
On 10/30/06, Yonik Seeley [EMAIL PROTECTED] wrote:
Hi Tom, I moderated your email in... you need to subscribe to prevent
your emails being blocked in the future.
Thanks. That's fixed, I hope. I was using the wrong address.
http://incubator.apache.org/solr
.
which would seem to be necessary to get the group id.
Does Solr add anything to Lucene in this regard?
Thanks,
Tom
advice how to
set this up, and how to be sure that both indexes do not interact ?
Many thanks for any help,
Best Greetings,
Tom
in it as another one with less factor
shall be before the others.
Something I do might be completely wrong, perhaps You have an idea ?
Thanks,
Tom
401 - 497 of 497 matches
Mail list logo