Re: Index size vs. number of documents

2008-08-15 Thread Chris Hostetter
: I'm surprised, as you are, by the non-linearity. Out of curiosity, what is Unless the data in stored fields is significantly greater then indexed fields the Index size almost never grows linearly with the number of documents -- it's the number of unique terms that tends to primarily

Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread johnwarde
Hi, Can I copy an index built on a Windows system to a Unix/Linux system and still work? Reason for my question: I have been working with Solr for the last month on a Windows system and I have determined that we need to have a replication solution for our future needs (volume of documents to be

Re: IndexOutOfBoundsException

2008-08-15 Thread Doug Steigerwald
We actually have this same exact issue on 5 of our cores. We're just going to wipe the index and reindex soon, but it isn't actually causing any problems for us. We can update the index just fine, there's just no merging going on. Ours happened when I reloaded all of our cores for a

Re: IndexOutOfBoundsException

2008-08-15 Thread Ian Connor
I tried it again (rm -rf /solr/index and post all the docs again) but this time, I get the error (I also switched to the Sun JVM to see if that helped): 15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/update params={} status=500 QTime=4576 15-Aug-08 4:57:08 PM

Re: IndexOutOfBoundsException

2008-08-15 Thread Ian Connor
Ignore that error - I think I installed the Sun JVM incorrectly - this seems unrelated to the error. On Fri, Aug 15, 2008 at 9:01 AM, Ian Connor [EMAIL PROTECTED] wrote: I tried it again (rm -rf /solr/index and post all the docs again) but this time, I get the error (I also switched to the Sun

Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread Erick Erickson
I've done exactly this many times in straight Lucene. Since Solr is built on Lucene, I wouldn't anticipate any problems. Make sure your transfer is binary mode... Best Erick On Fri, Aug 15, 2008 at 8:02 AM, johnwarde [EMAIL PROTECTED] wrote: Hi, Can I copy an index built on a Windows

Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread johnwarde
Excellent! Many thanks for your help Eric! John Erick Erickson wrote: I've done exactly this many times in straight Lucene. Since Solr is built on Lucene, I wouldn't anticipate any problems. Make sure your transfer is binary mode... Best Erick On Fri, Aug 15, 2008 at 8:02 AM,

Re: Indexing Only Parts of HTML Pages

2008-08-15 Thread Otis Gospodnetic
Hi Nick, Yes, sounds like either custom Nutch parsing code or custom HTML parser that has the logic you described and feeds Solr with docs constructed based on this logic. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nick Tkach [EMAIL

Re: Index size vs. number of documents

2008-08-15 Thread Phillip Farber
By Index size almost never grows linearly with the number of documents are you saying it increases more slowly that the number of documents, i.e. sub-linearly or more rapidly? With dirty OCR the number of unique terms is always increasing due to the garbage words -Phil Chris Hostetter

Re: Shard searching clarifications

2008-08-15 Thread Yonik Seeley
On Fri, Aug 15, 2008 at 12:34 PM, Phillip Farber [EMAIL PROTECTED] wrote: If I have 2 solr instances (solr1 and solr2) each serving a shard is it correct I only need to send my query to one of the shards, e.g. solr1:8080/select?shards=solr1,solr2 ... and that I'll get merged results over

Re: Index size vs. number of documents

2008-08-15 Thread Otis Gospodnetic
Here's an example. Consider 2 docs with terms: doc1: term1, term2, term3 doc2: term4, term5, term6 vs. doc1: term1, term2, term3 doc2: term1, term1, term6 All other things constant, the former will make index grow faster because it has more unique terms. Even if your OCR has garbage that

Re: Highlighting returns incorrect text on some results?

2008-08-15 Thread pdovyda2
Thanks Otis. I downloaded the nightly today and reindexed, and it seems that it was a bug that you've worked out since 1.2 as I don't see the issue anymore. Paul Otis Gospodnetic wrote: Paul, we had many highlighter-related changes since 1.2, so I suggest you try the nightly. Otis --

partialResults, distributed search SOLR-502

2008-08-15 Thread Brian Whitman
I was going to file a ticket like this: A SOLR-303 query with shards=host1,host2,host3 when host3 is down returns an error. One of the advantages of a shard implementation is that data can be stored redundantly across different shards, either as direct copies (e.g. when host1 and host3 are

Auto commit error and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
I have an index (different from the ones mentioned yesterday) that was working fine with 3M docs or so, but when I added a bunch more docs, bringing it closer to 4M docs, the index seemed to get corrupted. In particular, now when I start Solr up, or when when my indexing process tries add a

Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a (SOLR-561) feature getting built for doing replication in any platform . The patch works and it is tested. Do not expect it to work with the current trunk because a lot has changed in trunk since the last patch . We will be updating it soon once the dust settles down. - On Fri, Aug

Re: Auto commit error and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
I've done some more sniffing on the Lucene list, and noticed that Otis made the following comment about a FileNotFoundException problem in late 2005: Are you using Windows and a compound index format (look at your index dir - does it have .cfs file(s))? This may be a bad combination,

Re: Administrative questions

2008-08-15 Thread Jon Drukman
Jason Rennie wrote: On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible

failover sharding

2008-08-15 Thread Ian Connor
Hi, Is there a way to put a timeout or have some way of ignoring shards that are not there? For instance, I have 4 shards, and they have overlap with the documents for redundancy. shard 1 = 0-200 shard 2 = 100-400 shard 3 = 300-600 shard 4 = 500-600 0-100 This means if one of my shards goes

Solr Cache

2008-08-15 Thread Tim Christensen
We have two servers, with the same index load balanced. The indexes are updated at the same time every day. Occasionally, a search on one server will return different results from the other server, even though the data used to create the index is exactly the same. Is this possibly due to