Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Jeff Courtade
We are running SOLR 4.7.2 SolrCloud with 2 shards one Leader and one replica per shard. the Version of the replica and leader differ displayed here as... curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's//\n/g' long name=version7753045/long However the commitTimeMSec lastModified

Difficulties in getting Solrcloud running

2015-08-19 Thread Merlin Morgenstern
HI everybody, I am trying to setup solrcloud on ubuntu and somehow the graph on the admin interface does not show up. It is simply blanck. The tree is available. This is a test installation on one machine. There are 3 zookeepers running. I start two solr nodes like this: solr-5.2.1$ bin/solr

Re: Disable caching

2015-08-19 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote: Hmm...so I think I have things setup correctly, I have a custom QParserPlugin building a custom query that wraps the query built from the base parser and stores the user who is executing the query. I've added the

Re: Disable caching

2015-08-19 Thread Jamie Johnson
This was my original thought. We already have the thread local so should be straight fwd to just wrap the Field name and use that as the key. Again thanks, I really appreciate the feedback On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 10:58 PM, Jamie

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Erick, All my queries are based on fq (filter query). I have to send the randomly generated queries to warm up low level lucene cache. I went to the more tedious way to warm up low level cache without utilizing the three caches by turning off the three caches (set values to zero). Then, I

Re: Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Jeff Courtade
What I am trying to determine is a way to validate for instance if a leader dies. As in completely unrecoverable that the data on the replica is an exact match to what the leader had. I need to be able to monitor it and have confidence that it is working as expected. i had assumed the version

Re: Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Shawn Heisey
On 8/19/2015 7:52 AM, Jeff Courtade wrote: We are running SOLR 4.7.2 SolrCloud with 2 shards one Leader and one replica per shard. the Version of the replica and leader differ displayed here as... curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's//\n/g' long

RE: jetty.xml

2015-08-19 Thread Davis, Daniel (NIH/NLM) [C]
Jetty includes a QoSFilter, https://wiki.eclipse.org/Jetty/Reference/QoSFilter, with some changes I think it might be able to throttle the requests coming into Solr from truly outside, e.g. not SolrCloud replication, ZooKeeper etc., so as to make sure that Solr's own work could still get done.

Re: jetty.xml

2015-08-19 Thread Shawn Heisey
On 8/18/2015 11:50 PM, William Bell wrote: We sometimes get a spike in Solr, and we get like 3K of threads and then timeouts... In Solr 5.2.1 the defult jetty settings is kinda crazy for threads - since the value is HIGH! What do others recommend? The setting of 1 is so that there is

json facet

2015-08-19 Thread naga sharathrayapati
is it possible to specify facet.method with json nested faceting query? would like to see if there would be a performance improvement using methods

Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Susheel Kumar
Use command like below to create collection http:// IP:PORT/solr/admin/collections?action=CREATEname=NamenumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=configname_usedduring_ZKUpload Susheel On Wed, Aug 19, 2015 at 11:03 AM, Kevin Lee kgle...@yahoo.com.invalid wrote:

Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Kevin Lee
Hi, Have you created a collection yet? If not, then there won’t be a graph yet. It doesn’t show up until there is at least one collection. - Kevin On Aug 19, 2015, at 5:48 AM, Merlin Morgenstern merlin.morgenst...@gmail.com wrote: HI everybody, I am trying to setup solrcloud on

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, Thank you very much for pointing out the potential design issue The queries will be determined through a configuration by business users. There will be limited number of queries every day, and will get executed by customers repeatedly. However, business users will change the

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Erick Erickson
bq: can I limit the size of the three caches so that the RAM usage will be under control That's exactly what the size parameter is for. As Upayavira says, the rough size of each entry in the filterCache is maxDocs/8 + (sizeof query string). queryResultCache is much smaller per entry, it's

Lucene 5.2.1 Spatial Strategy PointVectorStrategy

2015-08-19 Thread Pablo Mincz
Hi, I'm implementing a sort search by distance with a PointVectorStrategy. In the index process I used createIndexableFields from the strategy and makePoint from the context GEO. But when I'm sorting the search I get the error: Java::JavaLang::IllegalStateException: unexpected docvalues type

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Upayavira
You say all of my queries are based upon fq? Why? How unique are they? Remember, for each fq value, it could end up storing one bit per document in your index. If you have 8m documents, you could end up with a cache usage of 1Mb, for that query alone! Filter queries are primarily designed for

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Upayavira
Yes, you can limit the size of the filter cache, as Erick says, but then, you could just end up with cache churn, where you are constantly re-populating your cache as stuff gets pushed out, only to have to regenerate it again for the next query. Is it possible to decompose these queries into

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Troy Edwards tedwards415...@gmail.com wrote: My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions on how that can be done? (use a client or uploadcsv or stream or data import

Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Tom Burton-West
Hello all, The last time I worked with changing Simlarities was with Solr 4.1 and at that time, it was possible to simply change the schema to specify the use of a different Similarity without re-indexing. This allowed me to experiment with several different ranking algorithms without having to

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Shawn Heisey
On 8/19/2015 11:09 AM, Troy Edwards wrote: I have a requirement where I have to bulk insert a lot of documents in SolrCloud. My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Upayavira
warning: I'm no expert on other similarities. Having said that, I'm not aware of similarities being used in the indexing process - during indexing term frequency, document frequency, field norms, and so on are all recorded. These are things that the default similarity (TF/IDF) uses to calculate

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, I happened to compose individual fq for each field, such as: fq=Gatewaycode:(...)fq=DestCode:(...)fq=DateDep:(...)fq=Duration:(...) It is nice to know that I am not creating unnecessary cache entries since the above method results in minimal carnality as you pointed out. Thank

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Vineeth Dasaraju
I have been using the solrj client and get speeds of 1000 objects per second. The size of my object is around 4 kb. On Aug 19, 2015 12:09 PM, Troy Edwards tedwards415...@gmail.com wrote: I have a requirement where I have to bulk insert a lot of documents in SolrCloud. My average document size

How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
I have a requirement where I have to bulk insert a lot of documents in SolrCloud. My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions on how that can be done? (use a client or

Re: Geospatial Predicate Question

2015-08-19 Thread david.w.smi...@gmail.com
Hi Jamie, Your understanding is inverted. The predicates can be read as: indexed shape predicate query shape. For indexed point data, there is almost no semantic different between the Within and Intersects predicates. There is if the field is multi-valued and you want to ensure that all of

How to find the ordinal for a numeric doc value

2015-08-19 Thread tedsolr
I'm trying to upgrade my custom post filter from Solr 4.9 to 5.2. This filter collapses documents based on a user chosen field set. The key to the whole thing is determining document uniqueness based on a fixed int array of field value ordinals. In 4.9 this worked regardless of the field type. In

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Erick Erickson
Ir you're sitting on HDFS anyway, you could use MapReduceIndexerTool. I'm not sure that'll hit your rate, it spends some time copying things around. If you're not on HDFS, though, it's not an option. Best, Erick On Wed, Aug 19, 2015 at 11:36 AM, Upayavira u...@odoko.co.uk wrote: On Wed, Aug

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread tedsolr
One error (others perhaps?) in my statement ... the code searcher.getLeafReader().getSortedDocValues(field) just returns null for numeric and date fields. That is why they appear to be ignored, not that the ordinals are all absent or equivalent. But my question is still valid I think! -- View

Re: Solrcloud node is not comming up

2015-08-19 Thread Susheel Kumar
When you are adding a node,what exactly you are looking for that node to do. Are you adding node to create a new Replica in which case you will call ADDREPLICA collections API. Thanks, Susheel On Wed, Aug 19, 2015 at 3:42 PM, Merlin Morgenstern merlin.morgenst...@gmail.com wrote: I have a

Re: Solrcloud node is not comming up

2015-08-19 Thread Erick Erickson
No, nothing. The graphical view shows collections and the associated replicas. This new node has no replicas that are part of any collection, so it won't show in the graphical view. If you create a new collection that happens to put a replica on the new node, it'll then show up as part of that

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Susheel Kumar
For Indexing 3.5 billion documents, you will not only run into bottleneck with Solr but also at different places (data acquisition, solr document object creation, submitting in bulk/batches to Solr). This will require parallelizing the above operations at each of the above steps which can get you

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Upayavira
On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote: Troy Edwards tedwards415...@gmail.com wrote: My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions on how that can be

Re: jetty.xml

2015-08-19 Thread Erick Erickson
what's happening on the system when you see this? If you're heavily indexing and NOT using SolrJ.cloudSolrSever/Client, then a lot of threads can be occupied forwarding documents to the other shards. Best, Erick On Wed, Aug 19, 2015 at 6:55 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Erick Erickson
If you're committing that rapidly then you're correct, filter caching may not be a good fit. The entire _point_ of filter caching is to increase performance of subsequent executions of the exact same fq clause. But if you're throwing them away every second there's little/no benefit. You really

Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
I have a Solrcloud cluster running with 2 nodes, configured with 1 shard and 2 replica. Now I have added a node on a new server, registered with the same three zookeepers. The node shows up inside the tree of the Solrcloud admin GUI under live nodes. Unfortunatelly the new node is not inside the

Re: Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
Thank you for the quick answer. I learned now how to use the Collections API. Is there a better way to issue the commands then to enter them into the Browser as URL and getting back JSON? 2015-08-19 22:23 GMT+02:00 Erick Erickson erickerick...@gmail.com: No, nothing. The graphical view shows

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Ahmet Arslan
Hi again, Here is a relevant/past discussion : http://search-lucene.com/m/eHNlTDHKb17MW532 Ahmet On Thursday, August 20, 2015 2:28 AM, Ahmet Arslan iori...@yahoo.com.INVALID wrote: Hi Tom, computeNorm(FieldInvertState) method is the only place where similarity is tied to indexing process.

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Ahmet Arslan
Hi Tom, computeNorm(FieldInvertState) method is the only place where similarity is tied to indexing process. If you want to switch between different similarities, they should share the same implementation for the method. For example, subclasses of SimilarityBase can be used without

Re: Reindexing

2015-08-19 Thread Alexandre Rafalovitch
Reload will get the new schema definitions. But all the indexed content will stay as is and will probably start causing problems if you changed analyzer definitions seriously. You probably will have to reindex from scratch/external source. Sorry. Solr Analyzers, Tokenizers, Filters, URPs

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Toke Eskildsen t...@statsbiblioteket.dk wrote Use more than one cloud. Make them fully independent. As I suggested when you asked 4 days ago. That would also make it easy to scale: Just measure how much a single setup can take and do the math. The goal is 250K documents/second. I tried

Reindexing

2015-08-19 Thread Azazel K
Hi, We have an over engineered index that we would be to rework. It's already holding 150M documents with 94GB of index size. We have High index/high query system running Solr 4.5. My question - If we update the schema, can we run reindex by using Reload action in CoreAdmin UI? Will that

Re: Cache

2015-08-19 Thread Nagasharath
Trying to evaluate the performance of queries with and without cache On 18-Aug-2015, at 11:30 am, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati sharathrayap...@gmail.com wrote: Is it possible to clear the cache through query? I need this

Re: Cache

2015-08-19 Thread Yonik Seeley
On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath sharathrayap...@gmail.com wrote: Trying to evaluate the performance of queries with and without cache Yeah, so to try and see how much a specific type of query costs, you can use {!cache=false} But I've seen some people trying to benchmark the

Re: Cache

2015-08-19 Thread Nagasharath
I will go with {!cache=false}. Can we specify facet method in json nested faceting query? On 19-Aug-2015, at 7:07 pm, Yonik Seeley ysee...@gmail.com wrote: On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath sharathrayap...@gmail.com wrote: Trying to evaluate the performance of queries with

SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.

2015-08-19 Thread forest_soup
Opened a JIRA - https://issues.apache.org/jira/browse/SOLR-7947 A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to down state and logs showing below errors. Logs are in attachmenent. solr.zip

Re: Cache

2015-08-19 Thread Walter Underwood
Why? Do you evaluate Unix performance with and without file buffers? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 19, 2015, at 5:00 PM, Nagasharath sharathrayap...@gmail.com wrote: Trying to evaluate the performance of queries with and

Re: Solrcloud node is not comming up

2015-08-19 Thread Erick Erickson
Well, you can use curl instead ;). But at present there's no real collections admin UI akin to the core admin UI, although that's in the works with the new Angular JS based admin UI, but the ETA is not defined quite yet although it shouldn't be all that far away. On Wed, Aug 19, 2015 at 2:48

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Are you suggesting that requests come into a service layer that identifies which client is on which solrcloud and passes the request to that cloud? Thank you On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Troy Edwards tedwards415...@gmail.com wrote: My

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Thank you for taking the time to do the test. I have been doing similar tests using the post Tool (SimplePostTool) with the real data and was able to get to about 10K documents/second. I am considering using multiple files (one per client) ftp'd into a solr node and then use a scheduled job to

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread Toke Eskildsen
tedsolr tsm...@sciquest.com wrote: I'm sure there is a good reason why SortedDocValues exposes the backing dictionary and [Sorted]NumericDocValues does not. There is: Numerics does not have a backing dictionary. Instead of storing the values via the intermediate ordinals-map (aka by

Re: Query time out. Solr node goes down.

2015-08-19 Thread Toke Eskildsen
On Tue, 2015-08-18 at 14:36 +0530, Modassar Ather wrote: So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is because of GC pause and it is actually not gone but the ZK is not able to get the correct state? That would be my guess. The issue is caused by a huge query with many

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Mikhail Khludnev
Hello, try to experiment with fq={!cache=false}... or fq={!cache=false cost=100}... see https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters On Wed, Aug 19, 2015 at 8:55 AM, Maulin Rathod mrat...@asite.com wrote: Hi,

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Mikhail Khludnev
Maulin, Did you check performance with segmented filters which I advised recently? On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote: As per my understanding caches are flushed every time when add new document to collection (we do soft commit at every 1 sec to make newly

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Toke Eskildsen
On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote: SLOW WITH FILTER QUERY (takes more than 1 second) q=+recipient_id:(4042) AND project_id:(332) AND resource_id:(13332247 13332245 13332243 13332241 13332239) AND entity_type:(2) AND

RE: Performance issue with FILTER QUERY

2015-08-19 Thread Maulin Rathod
As per my understanding caches are flushed every time when add new document to collection (we do soft commit at every 1 sec to make newly added document available for search). Due to which it is not effectively uses cache and hence it slow every time in our case. -Original Message-

How to Delta-Import to solr by Id(key word)

2015-08-19 Thread fent
I have a table with Id , this is a increase attribute, So I want to Delta add new category to solr may like select * from my_table where Id '${latest_id}' the latest_id is the max Id that last time add , how to config the data-config.xml. or how to get the max Id from the solr? ths! --

Solr having problems with highlighting when using Jieba anaylzer

2015-08-19 Thread Zheng Lin Edwin Yeo
Hi, I'm using Jieba analyser to index Chinese characters in the Solr. It works fine with the segmentation when using the Anaylsis on the Solr Admin UI. However, when I tried to do highlighting in Solr, it is not highlighting in the correct place. For example, when I search for 自然环境与企业本身, it

Re: plagiarism Checker with solr

2015-08-19 Thread Roshan Agarwal
Dear Jack, Thank you very much, Roshan Agarwal On Mon, Aug 10, 2015 at 8:38 PM, Jack Krupansky jack.krupan...@gmail.com wrote: The simplest and maybe best approach is to use the edismax query parser and query all terms using the OR operator and use the PF1, PF2, and PF3 parameters to boost