: I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
Unless the data in stored fields is significantly greater then indexed
fields the Index size almost never grows linearly with the number of
documents -- it's the number of unique terms that tends to primarily
Hi,
Can I copy an index built on a Windows system to a Unix/Linux system and
still work?
Reason for my question:
I have been working with Solr for the last month on a Windows system and I
have determined that we need to have a replication solution for our future
needs (volume of documents to be
We actually have this same exact issue on 5 of our cores. We're just
going to wipe the index and reindex soon, but it isn't actually
causing any problems for us. We can update the index just fine,
there's just no merging going on.
Ours happened when I reloaded all of our cores for a
I tried it again (rm -rf /solr/index and post all the docs again) but
this time, I get the error (I also switched to the Sun JVM to see if
that helped):
15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute
INFO: webapp=/solr path=/update params={} status=500 QTime=4576
15-Aug-08 4:57:08 PM
Ignore that error - I think I installed the Sun JVM incorrectly - this
seems unrelated to the error.
On Fri, Aug 15, 2008 at 9:01 AM, Ian Connor [EMAIL PROTECTED] wrote:
I tried it again (rm -rf /solr/index and post all the docs again) but
this time, I get the error (I also switched to the Sun
I've done exactly this many times in straight Lucene. Since Solr is built
on Lucene, I wouldn't anticipate any problems.
Make sure your transfer is binary mode...
Best
Erick
On Fri, Aug 15, 2008 at 8:02 AM, johnwarde [EMAIL PROTECTED] wrote:
Hi,
Can I copy an index built on a Windows
Excellent! Many thanks for your help Eric!
John
Erick Erickson wrote:
I've done exactly this many times in straight Lucene. Since Solr is built
on Lucene, I wouldn't anticipate any problems.
Make sure your transfer is binary mode...
Best
Erick
On Fri, Aug 15, 2008 at 8:02 AM,
Hi Nick,
Yes, sounds like either custom Nutch parsing code or custom HTML parser that
has the logic you described and feeds Solr with docs constructed based on this
logic.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Nick Tkach [EMAIL
By Index size almost never grows linearly with the number of
documents are you saying it increases more slowly that the number of
documents, i.e. sub-linearly or more rapidly?
With dirty OCR the number of unique terms is always increasing due to
the garbage words
-Phil
Chris Hostetter
On Fri, Aug 15, 2008 at 12:34 PM, Phillip Farber [EMAIL PROTECTED] wrote:
If I have 2 solr instances (solr1 and solr2) each serving a shard
is it correct I only need to send my query to one of the shards, e.g.
solr1:8080/select?shards=solr1,solr2 ...
and that I'll get merged results over
Here's an example.
Consider 2 docs with terms:
doc1: term1, term2, term3
doc2: term4, term5, term6
vs.
doc1: term1, term2, term3
doc2: term1, term1, term6
All other things constant, the former will make index grow faster because it
has more unique terms. Even if your OCR has garbage that
Thanks Otis. I downloaded the nightly today and reindexed, and it seems that
it was a bug that you've worked out since 1.2 as I don't see the issue
anymore.
Paul
Otis Gospodnetic wrote:
Paul, we had many highlighter-related changes since 1.2, so I suggest you
try the nightly.
Otis
--
I was going to file a ticket like this:
A SOLR-303 query with shards=host1,host2,host3 when host3 is down
returns an error. One of the advantages of a shard implementation is
that data can be stored redundantly across different shards, either as
direct copies (e.g. when host1 and host3 are
I have an index (different from the ones mentioned yesterday) that was
working fine with 3M docs or so, but when I added a bunch more docs,
bringing it closer to 4M docs, the index seemed to get corrupted. In
particular, now when I start Solr up, or when when my indexing process
tries add a
There is a (SOLR-561) feature getting built for doing replication in
any platform . The patch works and it is tested. Do not expect it to
work with the current trunk because a lot has changed in trunk since
the last patch . We will be updating it soon once the dust settles
down.
-
On Fri, Aug
I've done some more sniffing on the Lucene list, and noticed that Otis
made the following comment about a FileNotFoundException problem in
late 2005:
Are you using Windows and a compound index format (look at your index
dir - does it have .cfs file(s))?
This may be a bad combination,
Jason Rennie wrote:
On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote:
Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite
familiar with daemontools.
Thanks!
:) My pleasure. Was nice to hear recently that DJB is moving toward more
flexible
Hi,
Is there a way to put a timeout or have some way of ignoring shards
that are not there? For instance, I have 4 shards, and they have
overlap with the documents for redundancy.
shard 1 = 0-200
shard 2 = 100-400
shard 3 = 300-600
shard 4 = 500-600 0-100
This means if one of my shards goes
We have two servers, with the same index load balanced. The indexes
are updated at the same time every day. Occasionally, a search on one
server will return different results from the other server, even
though the data used to create the index is exactly the same.
Is this possibly due to
19 matches
Mail list logo