Re: Strategy for handling large (and growing) index: horizontal partitioning?

2008-03-03 Thread Kevin Lewandowski
How many documents are in the index?

If you haven't already done this I'd take a really close look at your
schema and make sure you're only storing the things that should really
be stored, same with the indexed fields. I drastically reduced my
index size just by changing some indexed/stored options on a few
fields.

On Thu, Feb 28, 2008 at 10:54 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 James,

  I can't comment more on the SN's arch choices.

  Here is the story about your questions
  - 1 Solr instance can hold 1+ indices, either via JNDI (see Wiki) or via the 
 new multi-core support which works, but is still being hacked on
  - See SOLR-303 in JIRA for distributed search.  Yonik committed it just the 
 other day, so now that's in nightly builds if you want to try it.  There are 
 2 Wiki pages about that, too, see Recent changes log on the Wiki to quickly 
 find them.


  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

  - Original Message 
   From: James Brady [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org


  Sent: Friday, February 29, 2008 1:11:07 AM
   Subject: Re: Strategy for handling large (and growing) index: horizontal 
 partitioning?
  
   Hi Otis,
   Thanks for your comments -- I didn't realise the wiki is open to
   editing; my apologies. I've put in a few words to try and clear
   things up a bit.
  
   So determining n will probably be a best guess followed by trial and
   error, that's fine. I'm still not clear about whether single Solr
   servers can operate across several indices, however.. can anyone give
   me some pointers here?
   An alternative would be to have 1 index per instance, and n instances
   per server, where n is small. This might actually be a practical
   solution -- I'm spending ~20% of my time committing, so I should
   probably only have 3 or 4 indices in total per server to avoid two
   committing at the same time.
  
   Your mention of The Large Social Network was interesting! A social
   network's data is by definition pretty poorly partitioned by user id,
   so unless they've done something extremely clever like co-locating
   social cliques in the same indices, I would have though it would be a
   sub-optimal architecture. If me and my friends are scattered around
   different indices, each search would have to be federated massively.
  
   James
  
  
   On 28 Feb 2008, at 20:49, Otis Gospodnetic wrote:
  
James,
   
Regarding your questions about n users per index - this is a fine
approach.  The largest Social Network that you know of uses the
same approach for various things, including full-text indices (not
Solr, but close).  You'd have to maintain user-shard/index mapping
somewhere, of course.  What should the n be, you ask?  Look at the
overall index size, I'd say, against server capabilities (RAM,
disk, CPU), increase n up to a point where you're maximizing your
hardware at some target query rate.
   
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
- Original Message 
From: James Brady


   To: solr-user@lucene.apache.org
Sent: Wednesday, February 27, 2008 10:08:02 PM
Subject: Strategy for handling large (and growing) index:
horizontal partitioning?
   
Hi all,
Our current setup is a master and slave pair on a single machine,
with an index size of ~50GB.
   
Query and update times are still respectable, but commits are taking
~20% of time on the master, while our daily index optimise can up to
4 hours...
Here's the most relevant part of solrconfig.xml:
 true
 10
 1000
 1
 1
   
I've given both master and slave 2.5GB of RAM.
   
Does an index optimise read and re-write the whole thing? If so,
taking about 4 hours is pretty good! However, the documentation here:
http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten
+minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b
states Optimizations can take nearly ten minutes to run... which
leads me to think that we've grossly misconfigured something...
   
Firstly, we would obviously love any way to reduce this optimise time
- I have yet to experiment extensively with the settings above, and
optimise frequency, but some general guidance would be great.
   
Secondly, this index size is increasing monotonously over time and as
we acquire new users. We need to take action to ensure we can scale
in the future. The approach we're favouring at the moment is
horizontal partitioning of indices by user id as our data suits this
scheme well. A given index would hold the indexed data for n users,
where n would probably be between 1 and 100 users, and we will have
multiple indices per search server.
   
Running server per index is impractical, especially for a small n, so
is a sinlge Solr instance capable of managing 

solr not finding all results

2007-10-12 Thread Kevin Lewandowski
I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for Geckoplp4-M returns 3
documents but I know that there are at least 100 documents with that
string.

Here is an example query for that phrase and the result set:
http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0rows=10indent=onfl=comments,id
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 lst name=params
  str name=rows10/str
  str name=start0/str
  str name=indenton/str
  str name=flcomments,id/str
  str name=qGeckoplp4-M/str
  str name=version2.2/str
 /lst
/lst
result name=response numFound=3 start=0
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816500/str
 /doc
 doc
  str name=commentstoptrax recordings. Same tracks.
Geckoplp4-M/str
  str name=idm2816544/str
 /doc
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2815903/str
 /doc
/result
/response

Now here's an example of a search for two documents that I know have
that string, but were not returned in the previous search:
http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611version=2.2start=0rows=10indent=onfl=id,comments
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
  str name=rows10/str
  str name=start0/str
  str name=indenton/str
  str name=flid,comments/str
  str name=qid:m2816615 OR id:m2816611/str
  str name=version2.2/str
 /lst
/lst
result name=response numFound=2 start=0
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816611/str
 /doc
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816615/str
 /doc
/result
/response

Here is the definition for the comments field:
field name=comments type=text indexed=true stored=true/

And here is the definition for a text field:
fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !--filter class=solr.StopFilterFactory ignoreCase=true/--
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  !--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/--
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
  !--filter class=solr.StopFilterFactory ignoreCase=true/--
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  !--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/--
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  /analyzer
/fieldtype

Any ideas? Am I doing something wrong?

thanks,
Kevin


Re: solr not finding all results

2007-10-12 Thread Kevin Lewandowski
Sorry, I've figured out my own problem. There is a problem with the
way I create the xml document for indexing that was causing some of
the comments fields to not be listed correctly in the default search
field, content.

On 10/12/07, Kevin Lewandowski [EMAIL PROTECTED] wrote:
 I've found an odd situation where solr is not returning all of the
 documents that I think it should. A search for Geckoplp4-M returns 3
 documents but I know that there are at least 100 documents with that
 string.

 Here is an example query for that phrase and the result set:
 http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0rows=10indent=onfl=comments,id
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  lst name=params
   str name=rows10/str
   str name=start0/str
   str name=indenton/str
   str name=flcomments,id/str
   str name=qGeckoplp4-M/str
   str name=version2.2/str
  /lst
 /lst
 result name=response numFound=3 start=0
  doc
   str name=commentsGeckoplp4-M/str
   str name=idm2816500/str
  /doc
  doc
   str name=commentstoptrax recordings. Same tracks.
 Geckoplp4-M/str
   str name=idm2816544/str
  /doc
  doc
   str name=commentsGeckoplp4-M/str
   str name=idm2815903/str
  /doc
 /result
 /response

 Now here's an example of a search for two documents that I know have
 that string, but were not returned in the previous search:
 http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611version=2.2start=0rows=10indent=onfl=id,comments
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
   str name=rows10/str
   str name=start0/str
   str name=indenton/str
   str name=flid,comments/str
   str name=qid:m2816615 OR id:m2816611/str
   str name=version2.2/str
  /lst
 /lst
 result name=response numFound=2 start=0
  doc
   str name=commentsGeckoplp4-M/str
   str name=idm2816611/str
  /doc
  doc
   str name=commentsGeckoplp4-M/str
   str name=idm2816615/str
  /doc
 /result
 /response

 Here is the definition for the comments field:
 field name=comments type=text indexed=true stored=true/

 And here is the definition for a text field:
 fieldtype name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   !-- in this example, we will only use synonyms at query time
   filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
   --
   !--filter class=solr.StopFilterFactory ignoreCase=true/--
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   !--filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/--
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory /
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
   !--filter class=solr.StopFilterFactory ignoreCase=true/--
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   !--filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/--
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory /
   /analyzer
 /fieldtype

 Any ideas? Am I doing something wrong?

 thanks,
 Kevin



Re: index size

2007-10-11 Thread Kevin Lewandowski
 To achieve this I have to keep the document field to stored right?

Yes, the field needs to be stored to return snippets.


 When I do this my index becomes huge 10 GB index, cause I have 10K
 docs but each is very lengthy HTML.  Is there any better solution?
 Why is index created by nutch so small in comparison (about 27 mb
 approx) but it still returns snippets!

Are you storing the complete html? If so I think you should strip out
the html then index the document.





 On 10/9/07, Kevin Lewandowski [EMAIL PROTECTED] wrote:
  Late reply on this but I just wanted to say thanks for the
  suggestions. I went through my whole schema and was storing things
  that didn't need to be stored and indexing a lot of things that didn't
  need to be indexed. Just completed a full reindex and it's a much more
  reasonable size now.
 
  Kevin
 
  On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote:
  
   On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:
  
Are there any tips on reducing the index size or what factors most
impact index size?
   
My index has 2.7 million documents and is 200 gigabytes and growing.
Most documents are around 2-3kb and there are about 30 indexed fields.
  
   An ls -sh will tell you roughly where the the space is being
   occupied.  There is something strange going on: 2.5kB * 2.7m is only
   6GB, and I have trouble imagining where the 30-fold index size
   expansion is coming from.
  
   -Mike
  
 



Re: index size

2007-10-09 Thread Kevin Lewandowski
Late reply on this but I just wanted to say thanks for the
suggestions. I went through my whole schema and was storing things
that didn't need to be stored and indexing a lot of things that didn't
need to be indexed. Just completed a full reindex and it's a much more
reasonable size now.

Kevin

On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote:

 On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:

  Are there any tips on reducing the index size or what factors most
  impact index size?
 
  My index has 2.7 million documents and is 200 gigabytes and growing.
  Most documents are around 2-3kb and there are about 30 indexed fields.

 An ls -sh will tell you roughly where the the space is being
 occupied.  There is something strange going on: 2.5kB * 2.7m is only
 6GB, and I have trouble imagining where the 30-fold index size
 expansion is coming from.

 -Mike



index size

2007-08-17 Thread Kevin Lewandowski
Are there any tips on reducing the index size or what factors most
impact index size?

My index has 2.7 million documents and is 200 gigabytes and growing.
Most documents are around 2-3kb and there are about 30 indexed fields.

thanks,
Kevin


Re: Snapshooting or replicating recently indexed data

2007-04-21 Thread Kevin Lewandowski

snapshooter does create incremental builds of the index. It doesn't
appear so if you look at the contents because the existing files are
hard links. But it is incremental.

On 4/20/07, Doss [EMAIL PROTECTED] wrote:

Hi Yonik,

Thanks for your quick response, my question is this, can we take incremental
backup/replication in SOLR?

Regards,
Doss.


M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise)
- Original Message -
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, April 19, 2007 7:42 PM
Subject: Re: Snapshooting or replicating recently indexed data


 On 4/19/07, Doss [EMAIL PROTECTED] wrote:
 It seems the snapshooter  takes the exact copy of the indexed data, that
 is all the contents inside the index directory,  how can we take the
 recently added once?
 ...
 cp -lr ${data_dir}/index ${temp}
 mv ${temp} ${name} ...


 I don't quite understand your question, but since hard links are used,
 it's more like pointing to the index files instead of copying them.
 Rsync is used as a transport to only move the files that were changed
 from the master to slaves.

 -Yonik




Re: Facet Browsing

2007-04-19 Thread Kevin Lewandowski

I recommend you build your query with facet options in raw format and
make sure you're getting back the data you want. Then build it into
your app.

On 4/18/07, Jennifer Seaman [EMAIL PROTECTED] wrote:

Does anyone have any sample code (php, perl, etc) how to setup facet
browsing with paging? I can't seem to get things like facet.mincount
to work. Thank you.

Jennifer Seaman





Re: Incremental replication...

2007-02-14 Thread Kevin Lewandowski

snapshooter copies all files but most files in the snapshot
directories are hard links pointing to segments in the main index
directory. So only new segments end up getting copied.

We've been running replication on discogs.com for several months and
it works great.

On 2/13/07, escher2k [EMAIL PROTECTED] wrote:


I was wondering if the scripts provided in Solr do incremental replication.
Looking at the script for snapshooter, it seems like the whole index
directory is copied over. Is that correct ? If so, isn't performance a
problem over the long run ? Thanks for the clarification in advance (I hope
I am wrong !!).
--
View this message in context: 
http://www.nabble.com/Incremental-replication...-tf3222946.html#a8951862
Sent from the Solr - User mailing list archive at Nabble.com.




Re: replication

2007-01-23 Thread Kevin Lewandowski

This should explain most everything:
http://wiki.apache.org/solr/CollectionDistribution

I've been running solr replication on discogs.com for a few months and
it works great!

Kevin

On 1/23/07, S Edirisinghe [EMAIL PROTECTED] wrote:

Hi,

I just started looking into solr. I like the features that have been listed.
I'm interested in how the replication feature is implemented since I have
build my own replication for lucene using unix rsync scripts.

Where would the best starting point be to find out how replication of the
search indeces are done?

thanks




Re: solr/tomcat stops responding

2006-12-03 Thread Kevin Lewandowski

Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing
else but generate a stack trace to the console or a log file.  If you
don't start tomcat by hand, the stack trace may go somewhere else I
suppose.  This would be useful to learn how to do on your particular
system (and we should add it to a debugging/troubleshooting wiki too).


Okay, I figured out how to get the thread dump. It was in the tomcat
logfile. I'm attaching it here.



Are you load-balancing at all, or is this your only search server?
FYI, I'm looking into something that will help.


I'm load balancing two solr servers.

thanks,
Kevin


thread_dump.txt.gz
Description: GNU Zip compressed data


Re: solr/tomcat stops responding

2006-12-02 Thread Kevin Lewandowski

accept connections for 3 or 4 hours ... did you try taking some thread
dumps like yonik suggested to see what all the threads were doing?


A kill -3 will not kill the process. It does nothing and there's no
thread dump on the console. kill -9 does kill it though.

btw, this has been a bigger problem for me because there's a separate
hardware issue and the system freezes about every 12 hours. So I have
to reboot it. After that I noticed solr not responding.

I've done a temporary fix for this by running a proxy in front of
tomcat. Then I updated my system startup to start solr, wait 20
seconds, run a few queries, wait 20 seconds, then start the proxy.
This is working fine now. But I'd still like to fix the real problem.
Let me know if there's anything else I can test or information I can
provide.

thanks,
Kevin


Re: solr/tomcat stops responding

2006-12-01 Thread Kevin Lewandowski

 My solr installation has been running fine for a few weeks but now
 after a server reboot it starts and runs for a few seconds, then stops
 responding. I don't see any errors in the logfiles, apart from
 snapinstaller not being able to issue a commit. Also, the process is
 using 100% cpu and stops responding to http requests (admin interface
 and queries).


Okay, some more happened after I sent this email. About 3 hours after
the reboot solr started running normally again. Then I rebooted it to
see if I could reproduce it. This time solr remained in the
not-responding state for about 4 hours but I did not wait longer to
see if it would come back.



- check what got changed after the server reboot... anything?


Nothing had been changed on the server.



Part of the fix for this has recently been committed into Lucene
(multiple threads won't generate the same FieldCache entry).


Has that been added to solr yet? I'm running solr-2006-11-20.



To see if this is your problem, restart the server and make sure no
traffic goes to it.
Then run some queries of the same type that will be hitting it to warm
it up, then turn on normal traffic.


Okay, I did that. Shut off traffic to the server, restarted solr, ran
a few queries against it, then turned traffic back on, and it's
running fine now. So maybe the initial flood of requests has something
to do with it?

thanks,
Kevin


Re: Cache stats

2006-11-29 Thread Kevin Lewandowski

In the admin interface, if you click statistics, there's a cache section.

On 11/29/06, Tom [EMAIL PROTECTED] wrote:

Hi -

I'm starting to try to tune my installation a bit, and I'm looking
for cache statistics. Is there a way to peek into a running
installation, and see what my cache stats are?

I'm looking for the usual cache hits/cache misses sort of things.

Also, on a related note, I was looking for solr info via mbeans. I
fired up jconsole, and I can see all sort of tomcat mbeans, but
nothing for solr. Is there something extra I have to do to turn this
on? I see things implementing SolrInfoMBean, so I'm assuming there is
something there.

(Off topic, but suggestions for anything better than JConsole also welcome).

Thanks,

Tom




Minimum time between distributions

2006-11-21 Thread Kevin Lewandowski

On Discogs I'm running Solr with two slaves and one master, using the
distribution scripts. The slaves pull and install a new snapshot every
five minutes and this is working very well so far.

Are there any risks with reducing this window to every one or two
minutes? With large caches could the autowarming take longer than one
or two minutes? It isn't a business need to reduce the window but I'm
just curious about the feasibility and risks.

How often do other people run snappuller and snapinstaller?

thanks,
Kevin


Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski

I have not done one but have been planning to do it based on this article:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html

With Solr it would be much simpler than the java examples they give.

On 10/30/06, Michael Imbeault [EMAIL PROTECTED] wrote:

Hello everyone,

Has anybody successfully implemented a Lucene spellchecker within Solr?
If so, could you give details on how one would achieve this?

If not, is it planned to make it as standard within Solr? Its a feature
almost every Solr application would want to use, so I think it would be
a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
coding that :(

Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212




Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski

I had the very same article in mind - how would it be simpler in Solr
than in Lucene? A spellchecker is pretty much standard in every major


I meant it would be a simpler implementation in Solr because you don't
have to deal with java or any Lucene API's. You just create a document
for each correct word. For example the word lettuce would have a
document:

doc
field name=wordlettuce/field
field name=start3let/field
field name=gram3let ett ttu tuc uce/field
field name=end3uce/field
field name=start4lett/field
field name=gram4lett ettu ttuc tuce/field
field name=end4tuce/field
/doc

Then you query Solr using the same syntax they describe in the article.

Anyway I haven't done this or tested it, but when reading that article
I thought it would be much easier to implement using Solr, at least
for me since I already have a database of correct words in Solr.

Kevin


Re: Solr use case

2006-10-11 Thread Kevin Lewandowski

No, after you add new documents you simply issue a commit/ command
and the new docs are searchable.

On Discogs.com we have just over 1 million docs in the index and do
about 20,000 updates per day. Every 15 minutes we read a queue and add
new documents, then commit. And we optimize once per day. I've had no
problems with that.

Kevin

On 10/11/06, climbingrose [EMAIL PROTECTED] wrote:

Hi all,

Is it true that Solr is mainly used for applications that rarely change the
underlying data? As I understand, if you submit new data or modify existing
data on Solr server, you would have to refresh the cache somehow to
display the updated data. If my application frequently gets new data/updates
from users, should I use Solr? I love faceted browsing and dynamic
properties so much but I need to justify the choice of Solr. Thanks. By the
way, does anyone have any performance measure that can be shared (apart from
the one on the Wiki)? As I estimated, my application probably has half a
million docs, each of which has around 15 properties, does anyone know the
type of hardware I would need for reasonable performance.

Thanks.

--
Regards,

Cuong Hoang




Re: Couple of problems

2006-10-11 Thread Kevin Lewandowski

I've had a problem similar to this and it was because of the
schema.xml. It was valid XML but there were some incorrect field
definitions and/or the default field listed was not a defined field.

I'd suggest you start with the default schema and build on it piece by
piece, each time testing for the error with a ping operation in the
admin page.

Kevin

On 10/11/06, mark [EMAIL PROTECTED] wrote:

Hi,

I have installed solr under a stand alone tomcat5.5 installation. I
can see the admin screens etc.

When I submit documents I get this error

Oct 11, 2006 10:05:44 AM org.apache.solr.core.SolrException
logSEVERE: java.lang.NullPointerException
 at org.apache.solr.update.DocumentBuilder.addField
(DocumentBuilder.java:78)
at org.apache.solr.update.DocumentBuilder.addField
(DocumentBuilder.java:74)
at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917)
  at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
at org.apache.solr.servlet.SolrUpdateServlet.doPost
(SolrUpdateServlet.java:52)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
.


My docs follow this schema:

  fields
field name=id type=string indexed=false stored=true/
field name=timestamp type=string indexed=true stored=true/
field name=url type=string indexed=false stored=true/
field name=collection type=text_ws indexed=true
stored=true/
field name=mimetype type=string indexed=true stored=true/
   field name=content type=text indexed=true stored=false/
/fields

Also - since getting this error I can no longer see part of the solr/
admin/stats.jsp screen - the boxes core, update , cache and other are
now empty. I deleted and reinstalled solr  (including the unpacked
webapps dir) but not tomcat and the problem is still there

cheers

mark



Re: Can't get q.op working

2006-09-27 Thread Kevin Lewandowski

Now I feel dumb! I hadn't deployed the latest build properly. The new
.war file was there but for some reason restarting tomcat didn't
reload it. Anyway, q.op is working fine now.

On 9/27/06, Erik Hatcher [EMAIL PROTECTED] wrote:

Kevin,

I've just tried this locally using the tutorial example data, using
both a default (in schema.xml) of AND and OR.  (I use the Ruby
response writer because it's easier to read than XML ;)

Use the default operator from schema.xml:
http://localhost:8983/solr/select?wt=rubyindent=2q=ipod%20belkin

Override with AND:
http://localhost:8983/solr/select?wt=rubyindent=2q=ipod%
20belkinq.op=AND

Override with OR:
http://localhost:8983/solr/select?wt=rubyindent=2q=ipod%
20belkinq.op=OR

All worked as expected in all cases.  There is one result with AND
and three results with OR.

I recommend you try this same scenario out with the tutorial example
data and ensure things work as I've stated here.  Let us know more
details if the problem persists.

Erik


On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote:

 I'm running the latest nightly build (2006-09-27) and cannot seem to
 get the q.op parameter working. I have the default operator set to AND
 and am testing with a two word query that returns no results. If I add
 OR to the query I get results. But if I remove the OR and add
 q.op=OR to the Solr query I still get no results.

 Is there anything I could be doing wrong?

 thanks
 Kevin




How much ram can Solr use?

2006-09-27 Thread Kevin Lewandowski

On the performace wiki page it mentions a test box with 16GB ram. Did
anything special need to be done to use that much ram (with the OS or
java)? Would Solr on a system with Linux x86_64 and Tomcat be able to
use that much ram? (sorry, I don't know Java so I don't know if there
are any limitations there).

thanks,
Kevin


Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski

I just wanted to say thanks to the Solr developers.

I'm now using Solr for the main search engine on Discogs.com. I've
been through five revisions of the search engine and this was
definitely the least painful. Solr gives me the power of Lucene
without having to deal with the guts. It made for a much faster
implementation than all other search packages I've worked with.

Some stats: there are now 1.1 million documents in the index and it
handles 200,000 searches per day (on a single-cpu P4 server with 1 gig
ram).

Kevin


Re: acts_as_solr

2006-08-30 Thread Kevin Lewandowski

You might want to look at acts_as_searchable for Ruby:
http://rubyforge.org/projects/ar-searchable

That's a similar plugin for the Hyperestraier search engine using its
REST interface.

On 8/28/06, Erik Hatcher [EMAIL PROTECTED] wrote:

I've spent a few hours tinkering with an Ruby ActiveRecord plugin to
index, delete, and search models fronted by a database into Solr.