We are running SOLR 4.7.2
SolrCloud with 2 shards
one Leader and one replica per shard.
the Version of the replica and leader differ displayed here as...
curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's//\n/g'
long name=version7753045/long
However the commitTimeMSec lastModified
HI everybody,
I am trying to setup solrcloud on ubuntu and somehow the graph on the admin
interface does not show up. It is simply blanck. The tree is available.
This is a test installation on one machine.
There are 3 zookeepers running.
I start two solr nodes like this:
solr-5.2.1$ bin/solr
On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote:
Hmm...so I think I have things setup correctly, I have a custom
QParserPlugin building a custom query that wraps the query built from the
base parser and stores the user who is executing the query. I've added the
This was my original thought. We already have the thread local so should
be straight fwd to just wrap the Field name and use that as the key. Again
thanks, I really appreciate the feedback
On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote:
On Tue, Aug 18, 2015 at 10:58 PM, Jamie
Hi Erick,
All my queries are based on fq (filter query). I have to send the randomly
generated queries to warm up low level lucene cache.
I went to the more tedious way to warm up low level cache without utilizing
the three caches by turning off the three caches (set values to zero). Then,
I
What I am trying to determine is a way to validate for instance if a leader
dies. As in completely unrecoverable that the data on the replica is an
exact match to what the leader had.
I need to be able to monitor it and have confidence that it is working as
expected.
i had assumed the version
On 8/19/2015 7:52 AM, Jeff Courtade wrote:
We are running SOLR 4.7.2
SolrCloud with 2 shards
one Leader and one replica per shard.
the Version of the replica and leader differ displayed here as...
curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's//\n/g'
long
Jetty includes a QoSFilter, https://wiki.eclipse.org/Jetty/Reference/QoSFilter,
with some changes I think it might be able to throttle the requests coming into
Solr from truly outside, e.g. not SolrCloud replication, ZooKeeper etc., so as
to make sure that Solr's own work could still get done.
On 8/18/2015 11:50 PM, William Bell wrote:
We sometimes get a spike in Solr, and we get like 3K of threads and then
timeouts...
In Solr 5.2.1 the defult jetty settings is kinda crazy for threads - since
the value is HIGH!
What do others recommend?
The setting of 1 is so that there is
is it possible to specify facet.method with json nested faceting query?
would like to see if there would be a performance improvement using methods
Use command like below to create collection
http://
IP:PORT/solr/admin/collections?action=CREATEname=NamenumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=configname_usedduring_ZKUpload
Susheel
On Wed, Aug 19, 2015 at 11:03 AM, Kevin Lee kgle...@yahoo.com.invalid
wrote:
Hi,
Have you created a collection yet? If not, then there won’t be a graph yet.
It doesn’t show up until there is at least one collection.
- Kevin
On Aug 19, 2015, at 5:48 AM, Merlin Morgenstern
merlin.morgenst...@gmail.com wrote:
HI everybody,
I am trying to setup solrcloud on
Hi Upayavira,
Thank you very much for pointing out the potential design issue
The queries will be determined through a configuration by business users.
There will be limited number of queries every day, and will get executed by
customers repeatedly. However, business users will change the
bq: can I limit the size of the three
caches so that the RAM usage will be under control
That's exactly what the size parameter is for.
As Upayavira says, the rough size of each entry in
the filterCache is maxDocs/8 + (sizeof query string).
queryResultCache is much smaller per entry, it's
Hi,
I'm implementing a sort search by distance with a PointVectorStrategy.
In the index process I used createIndexableFields from the strategy
and makePoint from the context GEO.
But when I'm sorting the search I get the error:
Java::JavaLang::IllegalStateException: unexpected docvalues type
You say all of my queries are based upon fq? Why? How unique are they?
Remember, for each fq value, it could end up storing one bit per
document in your index. If you have 8m documents, you could end up with
a cache usage of 1Mb, for that query alone!
Filter queries are primarily designed for
Yes, you can limit the size of the filter cache, as Erick says, but
then, you could just end up with cache churn, where you are constantly
re-populating your cache as stuff gets pushed out, only to have to
regenerate it again for the next query.
Is it possible to decompose these queries into
Troy Edwards tedwards415...@gmail.com wrote:
My average document size is 400 bytes
Number of documents that need to be inserted 25/second
(for a total of about 3.6 Billion documents)
Any ideas/suggestions on how that can be done? (use a client
or uploadcsv or stream or data import
Hello all,
The last time I worked with changing Simlarities was with Solr 4.1 and at
that time, it was possible to simply change the schema to specify the use
of a different Similarity without re-indexing. This allowed me to
experiment with several different ranking algorithms without having to
On 8/19/2015 11:09 AM, Troy Edwards wrote:
I have a requirement where I have to bulk insert a lot of documents in
SolrCloud.
My average document size is 400 bytes
Number of documents that need to be inserted 25/second (for a total of
about 3.6 Billion documents)
Any ideas/suggestions
warning: I'm no expert on other similarities.
Having said that, I'm not aware of similarities being used in the
indexing process - during indexing term frequency, document frequency,
field norms, and so on are all recorded. These are things that the
default similarity (TF/IDF) uses to calculate
Hi Upayavira,
I happened to compose individual fq for each field, such as:
fq=Gatewaycode:(...)fq=DestCode:(...)fq=DateDep:(...)fq=Duration:(...)
It is nice to know that I am not creating unnecessary cache entries since
the above method results in minimal carnality as you pointed out.
Thank
I have been using the solrj client and get speeds of 1000 objects per
second. The size of my object is around 4 kb.
On Aug 19, 2015 12:09 PM, Troy Edwards tedwards415...@gmail.com wrote:
I have a requirement where I have to bulk insert a lot of documents in
SolrCloud.
My average document size
I have a requirement where I have to bulk insert a lot of documents in
SolrCloud.
My average document size is 400 bytes
Number of documents that need to be inserted 25/second (for a total of
about 3.6 Billion documents)
Any ideas/suggestions on how that can be done? (use a client or
Hi Jamie,
Your understanding is inverted. The predicates can be read as:
indexed shape predicate query shape.
For indexed point data, there is almost no semantic different between the
Within and Intersects predicates. There is if the field is multi-valued
and you want to ensure that all of
I'm trying to upgrade my custom post filter from Solr 4.9 to 5.2. This filter
collapses documents based on a user chosen field set. The key to the whole
thing is determining document uniqueness based on a fixed int array of field
value ordinals. In 4.9 this worked regardless of the field type. In
Ir you're sitting on HDFS anyway, you could use MapReduceIndexerTool. I'm not
sure that'll hit your rate, it spends some time copying things around.
If you're not on
HDFS, though, it's not an option.
Best,
Erick
On Wed, Aug 19, 2015 at 11:36 AM, Upayavira u...@odoko.co.uk wrote:
On Wed, Aug
One error (others perhaps?) in my statement ... the code
searcher.getLeafReader().getSortedDocValues(field)
just returns null for numeric and date fields. That is why they appear to be
ignored, not that the ordinals are all absent or equivalent. But my question
is still valid I think!
--
View
When you are adding a node,what exactly you are looking for that node to
do. Are you adding node to create a new Replica in which case you will
call ADDREPLICA collections API.
Thanks,
Susheel
On Wed, Aug 19, 2015 at 3:42 PM, Merlin Morgenstern
merlin.morgenst...@gmail.com wrote:
I have a
No, nothing. The graphical view shows collections and the associated replicas.
This new node has no replicas that are part of any collection, so it won't
show in the graphical view.
If you create a new collection that happens to put a replica on the new node,
it'll then show up as part of that
For Indexing 3.5 billion documents, you will not only run into bottleneck
with Solr but also at different places (data acquisition, solr document
object creation, submitting in bulk/batches to Solr).
This will require parallelizing the above operations at each of the above
steps which can get you
On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote:
Troy Edwards tedwards415...@gmail.com wrote:
My average document size is 400 bytes
Number of documents that need to be inserted 25/second
(for a total of about 3.6 Billion documents)
Any ideas/suggestions on how that can be
what's happening on the system when you see this? If you're heavily
indexing and NOT
using SolrJ.cloudSolrSever/Client, then a lot of threads can be
occupied forwarding
documents to the other shards.
Best,
Erick
On Wed, Aug 19, 2015 at 6:55 AM, Davis, Daniel (NIH/NLM) [C]
daniel.da...@nih.gov
If you're committing that rapidly then you're correct, filter caching
may not be a good fit. The entire _point_ of
filter caching is to increase performance of subsequent executions of
the exact same fq clause. But if you're
throwing them away every second there's little/no benefit.
You really
I have a Solrcloud cluster running with 2 nodes, configured with 1 shard
and 2 replica. Now I have added a node on a new server, registered with the
same three zookeepers. The node shows up inside the tree of the Solrcloud
admin GUI under live nodes.
Unfortunatelly the new node is not inside the
Thank you for the quick answer. I learned now how to use the Collections
API.
Is there a better way to issue the commands then to enter them into the
Browser as URL and getting back JSON?
2015-08-19 22:23 GMT+02:00 Erick Erickson erickerick...@gmail.com:
No, nothing. The graphical view shows
Hi again,
Here is a relevant/past discussion :
http://search-lucene.com/m/eHNlTDHKb17MW532
Ahmet
On Thursday, August 20, 2015 2:28 AM, Ahmet Arslan iori...@yahoo.com.INVALID
wrote:
Hi Tom,
computeNorm(FieldInvertState) method is the only place where similarity is tied
to indexing process.
Hi Tom,
computeNorm(FieldInvertState) method is the only place where similarity is tied
to indexing process.
If you want to switch between different similarities, they should share the
same implementation for the method. For example, subclasses of SimilarityBase
can be used without
Reload will get the new schema definitions. But all the indexed
content will stay as is and will probably start causing problems if
you changed analyzer definitions seriously.
You probably will have to reindex from scratch/external source.
Sorry.
Solr Analyzers, Tokenizers, Filters, URPs
Toke Eskildsen t...@statsbiblioteket.dk wrote
Use more than one cloud. Make them fully independent.
As I suggested when you asked 4 days ago. That would
also make it easy to scale: Just measure how much a
single setup can take and do the math.
The goal is 250K documents/second.
I tried
Hi,
We have an over engineered index that we would be to rework. It's already
holding 150M documents with 94GB of index size. We have High index/high query
system running Solr 4.5.
My question - If we update the schema, can we run reindex by using Reload
action in CoreAdmin UI? Will that
Trying to evaluate the performance of queries with and without cache
On 18-Aug-2015, at 11:30 am, Yonik Seeley ysee...@gmail.com wrote:
On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati
sharathrayap...@gmail.com wrote:
Is it possible to clear the cache through query?
I need this
On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath sharathrayap...@gmail.com wrote:
Trying to evaluate the performance of queries with and without cache
Yeah, so to try and see how much a specific type of query costs, you can use
{!cache=false}
But I've seen some people trying to benchmark the
I will go with {!cache=false}.
Can we specify facet method in json nested faceting query?
On 19-Aug-2015, at 7:07 pm, Yonik Seeley ysee...@gmail.com wrote:
On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath sharathrayap...@gmail.com
wrote:
Trying to evaluate the performance of queries with
Opened a JIRA - https://issues.apache.org/jira/browse/SOLR-7947
A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart
one solr node, the cores on it turns to down state and logs showing below
errors.
Logs are in attachmenent. solr.zip
Why? Do you evaluate Unix performance with and without file buffers?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 19, 2015, at 5:00 PM, Nagasharath sharathrayap...@gmail.com wrote:
Trying to evaluate the performance of queries with and
Well, you can use curl instead ;).
But at present there's no real collections admin UI akin to the core
admin UI, although that's in the works with the new Angular JS based
admin UI, but the ETA is not defined quite yet although it shouldn't
be all that far away.
On Wed, Aug 19, 2015 at 2:48
Are you suggesting that requests come into a service layer that identifies
which client is on which solrcloud and passes the request to that cloud?
Thank you
On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
Troy Edwards tedwards415...@gmail.com wrote:
My
Thank you for taking the time to do the test.
I have been doing similar tests using the post Tool (SimplePostTool) with
the real data and was able to get to about 10K documents/second.
I am considering using multiple files (one per client) ftp'd into a solr
node and then use a scheduled job to
tedsolr tsm...@sciquest.com wrote:
I'm sure there is a good reason why SortedDocValues exposes
the backing dictionary and [Sorted]NumericDocValues does not.
There is: Numerics does not have a backing dictionary. Instead of storing the
values via the intermediate ordinals-map (aka by
On Tue, 2015-08-18 at 14:36 +0530, Modassar Ather wrote:
So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
because of GC pause and it is actually not gone but the ZK is not able to
get the correct state?
That would be my guess.
The issue is caused by a huge query with many
Hello,
try to experiment with fq={!cache=false}... or fq={!cache=false cost=100}...
see https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters
On Wed, Aug 19, 2015 at 8:55 AM, Maulin Rathod mrat...@asite.com wrote:
Hi,
Maulin,
Did you check performance with segmented filters which I advised recently?
On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote:
As per my understanding caches are flushed every time when add new
document to collection (we do soft commit at every 1 sec to make newly
On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote:
SLOW WITH FILTER QUERY (takes more than 1 second)
q=+recipient_id:(4042) AND project_id:(332) AND resource_id:(13332247
13332245 13332243 13332241 13332239) AND entity_type:(2) AND
As per my understanding caches are flushed every time when add new document to
collection (we do soft commit at every 1 sec to make newly added document
available for search). Due to which it is not effectively uses cache and hence
it slow every time in our case.
-Original Message-
I have a table with Id , this is a increase attribute,
So I want to Delta add new category to solr may like select * from
my_table where Id '${latest_id}'
the latest_id is the max Id that last time add ,
how to config the data-config.xml.
or how to get the max Id from the solr?
ths!
--
Hi,
I'm using Jieba analyser to index Chinese characters in the Solr. It works
fine with the segmentation when using the Anaylsis on the Solr Admin UI.
However, when I tried to do highlighting in Solr, it is not highlighting in
the correct place. For example, when I search for 自然环境与企业本身, it
Dear Jack,
Thank you very much,
Roshan Agarwal
On Mon, Aug 10, 2015 at 8:38 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:
The simplest and maybe best approach is to use the edismax query parser and
query all terms using the OR operator and use the PF1, PF2, and PF3
parameters to boost
58 matches
Mail list logo