Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread Erick Erickson
Right, if there's no "fixed version" mentioned and if the resolution is "unresolved", it's not in the code base at all. But that JIRA is not apparently reproducible, especially on more recent versions that 6.2. Is it possible to test a more recent version (6.6.2 would be my recommendation). Erick

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
Well, you can always manually change the ZK nodes, but whether just setting a node's state to "leader" in ZK then starting the Solr instance hosting that node would work... I don't know. Do consider running CheckIndex on one of the replicas in question first though. Best, Erick On Tue, Nov 21,

Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
My bad. I found it at https://issues.apache.org/jira/browse/SOLR-9453 But I could not find it in changes.txt perhaps because its yet not resolved. On Tue, Nov 21, 2017 at 9:15 AM, Erick Erickson wrote: > Did you check the JIRA list? Or CHANGES.txt in more recent

Re: tokenstream reusable

2017-11-21 Thread Mikhail Khludnev
Hello, Roxana. You probably looking for TeeSinkTokenFilter, but I believe the idea is cumbersome to implement in Solr. Also there is a preanalyzed field which can keep tokenstream in external form.

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi, I have encountered this error during the merging of the 3.5TB of index. What could be the cause that lead to this? Exception in thread "main" Exception in thread "Lucene Merge Thread #8" java.io. IOException: background merge hit exception: _6f(6.5.1):C7256757 _6e(6.5.1):C646 2072

FORCELEADER not working - solr 6.6.1

2017-11-21 Thread Joe Obernberger
Hi All - sorry for the repeat, but I'm at a complete loss on this.  I have a collection with 100 shards and 3 replicas each.  6 of the shard will not elect a leader.  I've tried the FORCELEADER command, but nothing changes. The log shows 'Force leader attempt 1.  Waiting 5 secs for an active

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this. I should be able to look into this shortly if no one else does. -Yonik On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley wrote: > Thanks for the complete info that allowed me to easily reproduce this! > The

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
One other data point I just saw on one of the nodes.  It has the following error: 2017-11-21 22:59:48.886 ERROR (coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS s:shard14 r:core_node175 x:UNCLASS_shard14_replica3] o.a.s.c.ShardLeaderElectionContext There was a problem trying

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
Thanks for the complete info that allowed me to easily reproduce this! The bug seems to extend beyond hll/unique... I tried min(string_s) and got wonky results as well. -Yonik On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev wrote: > Hello, > > I've encountered 2 issues

Re: Data inconsistencies and updates in solrcloud

2017-11-21 Thread Tom Barber
Thanks Erick! As I said, user error! ;) Tom On 21/11/17 22:41, Erick Erickson wrote: I think you're confusing shards with replicas. numShards is 2, each with one replica. Therefore half of your docs will wind up on one replica and half on the other. If you're adding a single doc, by

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Hi Hendrick - the shards in question have three replicas.  I tried restarting each one (one by one) - no luck.  No leader is found.  I deleted one of the replicas and added a new one, and the new one also shows as 'down'.  I also tried the FORCELEADER call, but that had no effect.  I checked

Re: Data inconsistencies and updates in solrcloud

2017-11-21 Thread Erick Erickson
I think you're confusing shards with replicas. numShards is 2, each with one replica. Therefore half of your docs will wind up on one replica and half on the other. If you're adding a single doc, by definition it'll be placed on only one of the two shards. If your shards had multiple replicas,

Re: Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
I have submitted a patch to make the query generated for overlapping query terms somewhat configurable (w/ default being SynonymQuery), based on practices I've seen in the field. I'd love to hear feedback https://issues.apache.org/jira/browse/SOLR-11662 On Tue, Nov 21, 2017 at 12:37 PM Doug

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Thank you Erick.  I've set the RamBufferSize to 1G; perhaps higher would be beneficial.  One more data point is that if I restart a node, more often than not, it goes into recovery, beats up the network for a while, and then goes green.  This happens even if I do no indexing between restarts. 

Data inconsistencies and updates in solrcloud

2017-11-21 Thread Tom Barber
Hi folks I can't find an answer to this, and its clearly user error, we have a collection in solrcloud that is started numShards=2 replicationFactor=1 solr seems happy the collection seems happy. Yet when we post and update to it and then look at the record again, it seems to only affect one

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
bq: We are doing lots of soft commits for NRT search... It's not surprising that this is slower than local storage, especially if you have any autowarming going on. Opening new searchers will need to read data from disk for the new segments, and HDFS may be slower here. As far as the commit

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We sometimes also have replicas not recovering. If one replica is left active the easiest is to then to delete the replica and create a new one. When all replicas are down it helps most of the time to restart one of the nodes that contains a replica in down state. If that also doesn't get the

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
We've never run an index this size in anything but HDFS, so I have no comparison.  What we've been doing is keeping two main collections - all data, and the last 30 days of data.  Then we handle queries based on date range.  The 30 day index is significantly faster. My main concern right now

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We actually also have some performance issue with HDFS at the moment. We are doing lots of soft commits for NRT search. Those seem to be slower then with local storage. The investigation is however not really far yet. We have a setup with 2000 collections, with one shard each and a

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
Unfortunately I can not upload my cleanup code but the steps I'm doing are quite easy. I wrote it in Java using the HDFS API and Curator for ZooKeeper. Steps are:     - read out the children of /collections in ZK so you know all the collection names     - read /collections//state.json to get

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
We set the hard commit time long because we were having performance issues with HDFS, and thought that since the block size is 128M, having a longer hard commit made sense.  That was our hypothesis anyway.  Happy to switch it back and see what happens. I don't know what caused the cluster to

NPE in modifyCollection

2017-11-21 Thread Nate Dire
Hi, I'm trying to set a replica placement rule on an existing collection and getting a NPE. It looks like the update code is assuming there's a current value. Collection: highspot_test operation: modifycollection failed:java.lang.NullPointerException at

tokenstream reusable

2017-11-21 Thread Roxana Danger
Hello all, I would like to reuse the tokenstream generated in one field, to create a new tokenstream for another field without executing again the whole analysis. The particulate application is: - I have field *tokens* with an analyzer that generate the tokens (and maintains the token type

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)... I need to back up a bit. Once nodes are in this state it's not surprising that they need to be forcefully killed. I was more thinking about how they got in this situation in the first place. _Before_ you get into the nasty

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
A clever idea.  Normally what we do when we need to do a restart, is to halt indexing, and then wait about 30 minutes.  If we do not wait, and stop the cluster, the default scripts 180 second timeout is not enough and we'll have lock files to clean up.  We use puppet to start and stop the

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
Hi, the write.lock issue I see as well when Solr is not been stopped gracefully. The write.lock files are then left in the HDFS as they do not get removed automatically when the client disconnects like a ephemeral node in ZooKeeper. Unfortunately Solr does also not realize that it should be

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Erick - thank you very much for the reply.  I'm still working through restarting the nodes one by one. I'm stopping the nodes with the script, but yes - they are being killed forcefully because they are in this recovery, failed, retry loop.  I could increase the timeout, but they never seem

Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
We help clients that perform index-time semantic expansion to hypernyms at index time. For example, they will have a synonyms file that does the following wing_tips => wing_tips, dress_shoes, shoes dress_shoes => dress_shoes, shoes oxfords => oxfords, dress_shoes, shoes Then at query time, we

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Shawn Heisey
On 11/21/2017 9:17 AM, Walter Underwood wrote: > All our customizations are in solr.in.sh. We’re using the one we configured > for 6.3.0. I’ll check for any differences between that and the 6.5.1 script. The order looks correct to me -- the arguments for the OOM killer are listed *before* the

Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread Erick Erickson
Did you check the JIRA list? Or CHANGES.txt in more recent versions? On Tue, Nov 21, 2017 at 1:13 AM, S G wrote: > Hi, > > We are running 6.2 version of Solr and hitting this error frequently. > > Error while trying to recover.

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Erick Erickson
Walter: Yeah, I've seen this on occasion. IIRC, the OOM exception will be specific to running out of stack space, or at least slightly different than the "standard" OOM error. That would be the "smoking gun" for too many threads Erick On Tue, Nov 21, 2017 at 9:00 AM, Walter Underwood

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
How are you stopping Solr? Nodes should not go into recovery on startup unless Solr was killed un-gracefully (i.e. kill -9 or the like). If you use the bin/solr script to stop Solr and see a message about "killing XXX forcefully" then you can lengthen out the time the script waits for shutdown

Re: Custom analyzer & frequency

2017-11-21 Thread Erick Erickson
One thing you might do is use the termfreq function to see that it looks like in the index. Also the schema/analysis page will put terms in "buckets" by power-of-2 so that might help too. Best, Erick On Tue, Nov 21, 2017 at 7:55 AM, Barbet Alain wrote: > You rock,

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
I do have one theory about the OOM. The server is running out of memory because there are too many threads. Instead of queueing up overload in the load balancer, it is queue in new threads waiting to run. Setting solr.jetty.threads.max to 10,000 guarantees this will happen under overload. New

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Erick Erickson
bq: but those use analyzing infix, so they are search indexes, not in-memory Sure, but they still can consume heap. Most of the index is MMapped of course, but there are some control structures, indexes and the like still kept on the heap. I suppose not using the suggester would nail it though.

Re: Solr cloud in kubernetes

2017-11-21 Thread Upayavira
We hopefully will switch to Kubernetes/Rancher 2.0 from Rancher 1.x/Docker, soon. Here are some utilities that we've used as run-once containers to start everything up: https://github.com/odoko-devops/solr-utils Using a single image, run with many different configurations, we have been able to

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
All our customizations are in solr.in.sh. We’re using the one we configured for 6.3.0. I’ll check for any differences between that and the 6.5.1 script. I don’t see any arguments at all in the dashboard. I do see them in a ps listing, right at the end. java -server -Xms8g -Xmx8g -XX:+UseG1GC

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
I am using the IndexMergeTool from Solr, from the command below: java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar org.apache.lucene.misc.IndexMergeTool The heap size is 32GB. There are more than 20 million documents in the two cores. Regards, Edwin On 21 November 2017 at 21:54,

Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
You rock, thank you so much for this clear answer, I loose 2 days for nothing as I've already the term freq but now I've an answer :-) (And yes I check it's the doc freq, not the term freq). Thank you again ! 2017-11-21 16:34 GMT+01:00 Emir Arnautović : > Hi Alain,

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain, As explained in prev mail that is doc frequency and each doc is counted once. I am not sure if Luke can provide you information about overall term frequency - sum of term frequency of all docs. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain, I haven’t been using Luke UI in a while, but if you are talking about top terms for some field, that might be doc freq, not term freq and every doc is counted once - that is equivalent to “Load Term Info” in “Schema” in Solr Admin console. HTH, Emir -- Monitoring - Log Management -

Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
$ cat add_test.sh DATA=' 666 toto titi tata toto tutu titi ' $ sh add_test.sh 0484 $ curl 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr=index' 00 So it's not only on Luke Side, it's come from Solr. Does it sound normal ? 2017-11-21 11:43

Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
Thank you very much for your answer. It was an error on copy / paste on my mail sorry about that ! So it was already a text field, so omitTermFrequenciesAndPosition was already on “false” So I forget my custom analyzer and try to test with an already defined field_type (text_fr) and see same

Re: Merging of index in Solr

2017-11-21 Thread Shawn Heisey
On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote: Does anyone knows how long usually the merging in Solr will take? I am currently merging about 3.5TB of data, and it has been running for more than 28 hours and it is not completed yet. The merging is running on SSD disk. The following will

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin, I’ll let somebody with more knowledge about merge to comment merge aspects. What do you use to merge those cores - merge tool or you run it using Solr’s core API? What is the heap size? How many documents are in those two cores? Regards, Emir -- Monitoring - Log Management - Alerting -

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Shawn Heisey
On 11/20/2017 6:17 PM, Walter Underwood wrote: When I ran load benchmarks with 6.3.0, an overloaded cluster would get super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting OOMs. That is really bad, because it means we need to reboot every node in the cluster. Also,

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi Emir, Thanks for your reply. There are only 1 host, 1 nodes and 1 shard for these 3.5TB. The merging has already written the additional 3.5TB to another segment. However, it is still not a single segment, and the size of the folder where the merged index is supposed to be is now 4.6TB, This

Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Hi All - we have a system with 45 physical boxes running solr 6.6.1 using HDFS as the index.  The current index size is about 31TBytes.  With 3x replication that takes up 93TBytes of disk. Our main collection is split across 100 shards with 3 replicas each.  The issue that we're running into

Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Volodymyr Rudniev
Hello, I've encountered 2 issues while trying to apply unique()/hll() function to a string field inside a range facet: 1. Results are incorrect for a single-valued string field. 2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. How to reproduce: 1.

Re: Issue facing with spell text field containing hyphen

2017-11-21 Thread Atita Arora
I was about to suggest the same , Analysis Panel is the savior in such cases of doubts. -Atita On Tue, Nov 21, 2017 at 7:26 AM, Rick Leir wrote: > Chirag > Look in Sor Admin, the Analysis panel. Put spider-man in the left and > right text inputs, and see how it gets

Re: Issue facing with spell text field containing hyphen

2017-11-21 Thread Rick Leir
Chirag Look in Sor Admin, the Analysis panel. Put spider-man in the left and right text inputs, and see how it gets analysed. Cheers -- Rick On November 20, 2017 10:00:49 PM EST, Chirag garg wrote: >Hi Rick, > >Actually my spell field also contains text with hyphen i.e. it

Re: Please help me with solr plugin

2017-11-21 Thread Binoy Dalal
Zara, If you're looking for custom search components, request handlers or update processors, you can check out my github repo with examples here: https://github.com/bdalal/SolrPluginsExamples/ On Tue, Nov 21, 2017 at 3:58 PM Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Zara, >

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain, You did not provided definition of used field type - you use “nametext” type and pasted “text_ami” field type. It is possible that you have omitTermFrequenciesAndPosition=“true” on nametext field type. The default value for text fields should be false. HTH, Emir -- Monitoring - Log

Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
Hi, I build a custom analyzer & setup it in solr, but doesn't work as I expect. I always get 1 as frequency for each word even if it's present multiple time in the text. So I try with default analyzer & find same behavior: My schema alian@yoda:~/solr> cat add_test.sh DATA='

Re: Please help me with solr plugin

2017-11-21 Thread Emir Arnautović
Hi Zara, What sort of plugins are you trying to build? What sort os issues did you run into? Maybe you are not too far from having running custom plugin. I would recommend you try running some of existing plugins as your own - just to make sure that you are able to build and configure custom

Please help me with solr plugin

2017-11-21 Thread Zara Parst
Hi, I have spent too much time learning plugin for Solr. I am about give up. If some one has experience writing it. Please contact me. I am open to all options. I want to learn it at any cost. Thanks Zara

NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
Hi, We are running 6.2 version of Solr and hitting this error frequently. Error while trying to recover. core=my_core:java.lang.NullPointerException at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605) at

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin, How many host/nodes/shard are those 3.5TB? I am not familiar with merge code, but trying to think what it might include, so don’t take any of following as ground truth. Merging for sure will include segments rewrite, so you better have additional 3.5TB if you are merging it to a