RE: CDCR - how to deal with the transaction log files

2017-07-20 Thread Patrick Hoeffel
I'm working on my first setup of CDCR, and I'm seeing the same "The log reader for target collection {collection name} is not initialised" as you saw. It looks like you're creating collections on a regular basis, but for me, I create it one time and never again. I've been creating the

Re: Getting IO Exception while Indexing

2017-07-20 Thread mesenthil1
While debugging following are the findings. When we send the same document as json, it is getting indexed without an issue. When the same document is converted as SolrInputDocument and sent to solr using SolrServer, it fails. -- View this message in context:

Re: Solr Issue While indexing Data

2017-07-20 Thread rajat rastogi
Hi Shawn , I have Two instances of solr running and my indexing process is in java as well . PID 15958 is my indexing process. PID 4499 is my Solr instance which has Stuck Commits PID 9299 is another solr instance which is forking fine regards Rajat On 20-Jul-2017, at 16:40, Shawn Heisey-2

Re: The unified highlighter html escaping. Seems rather extreme...

2017-07-20 Thread David Smiley
The escaping does appear excessive. Please file a bug to the Lucene project in Apache JIRA. On Fri, May 26, 2017 at 11:26 AM Michael Joyner wrote: > Isn't the unified html escaper a rather bit extreme in it's escaping? > > It makes it hard to deal with for simple

Re: Highlighting words with special characters

2017-07-20 Thread Lasitha Wattaladeniya
Hi Shawn, Yes I can confirm, it works with out any errors with multiple tokenizers. Following is my analysis chain StandardTokenizerFactory (only in index) StopFilterFactory LowerCaseFilterFactory ASCIIFoldingFilterFactory EnglishPossessiveFilterFactory StemmerOverrideFilterFactory (only in

RE: Issues trying to boost phrase containing stop word

2017-07-20 Thread Phil Scadden
The simplest suggestion is get rid of the stop word filter. I've seen people here comment that it is not worth it for the amount of space it saves. -Original Message- From: shamik [mailto:sham...@gmail.com] Sent: Friday, 21 July 2017 9:49 a.m. To: solr-user@lucene.apache.org Subject: Re:

Re: Issues trying to boost phrase containing stop word

2017-07-20 Thread shamik
Any suggestion? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4347068.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp
If the range query is so much better shouldn't the Solr query parser create a range query for a token query that only contains the wildcard? For the *:* case it does already contain a special path. On 20.07.2017 21:00, Shawn Heisey wrote: On 7/20/2017 7:20 AM, Hendrik Haddorp wrote: the Solr

Re: DateRangeField and Timezone

2017-07-20 Thread Ulul
Hi Got it thanks to debug option TZ applies only to date computations, so you have to compute a date :) The document {"date" : "2016-12-31T04:15:00Z", "desc" : "winter time day before" } is retrieved with query date:[2016-12-31T12:15:00Z/DAY TO 2017-01-03T12:15:00Z/DAY] and

Re: Solr 6.6 test failure: TestSolrCloudWithKerberosAlt.testBasics

2017-07-20 Thread Nawab Zada Asad Iqbal
Mine is actually very different:- -test: [junit4] says ᐊᐃ! Master seed: C3B77541FB9DE693 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(37742@mbp-9009). [junit4] Suite: org.apache.solr.cloud.TestSolrCloudWithKerberosAlt [junit4] 2> NOTE:

Re: Solr 6.6 test failure: TestSolrCloudWithKerberosAlt.testBasics

2017-07-20 Thread Steve Rowe
Does it look like this?: I see failures like that on my Jenkins once or twice a week. -- Steve www.lucidworks.com > On Jul 20, 2017, at 3:53 PM, Nawab Zada Asad Iqbal

Solr 6.6 test failure: TestSolrCloudWithKerberosAlt.testBasics

2017-07-20 Thread Nawab Zada Asad Iqbal
Hi, I cloned solr 6.6 branch today and I see this failure consistently. TestSolrCloudWithKerberosAlt.testBasics I had done some script changes but after seeing this failure I reverted them and ran: `ant -Dtestcase=TestSolrCloudWithKerberosAlt clean test` but this test still fails with this

Re: Copy field a source of copy field

2017-07-20 Thread Erick Erickson
Yep, we're not communication ;) Use the original source field for the genus, as: The difficulty here is that there might be false hits if the genera names happen to match words in the input that are not part of a genus/species pair. On Thu, Jul 20, 2017 at 9:55 AM, tstusr

Re: finds all documents without a value for field

2017-07-20 Thread Erick Erickson
One other possibility is to create a second boolean field "has_terms" or something and just add an fq clause like "=has_terms:false" On Thu, Jul 20, 2017 at 12:00 PM, Shawn Heisey wrote: > On 7/20/2017 7:20 AM, Hendrik Haddorp wrote: >> the Solr 6.6. ref guide states

Re: finds all documents without a value for field

2017-07-20 Thread Shawn Heisey
On 7/20/2017 7:20 AM, Hendrik Haddorp wrote: > the Solr 6.6. ref guide states that to "finds all documents without a > value for field" you can use: > -field:[* TO *] > > While this is true I'm wondering why it is recommended to use a range > query instead of simply: > -field:* Performance. A

RE: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Davis, Daniel (NIH/NLM) [C]
Muhammad, This sounds like it might be handled better by multiple collections rather than multiple "sub collections". If you create a new collection for each date, all using the same common config set, and then create an alias that contains all of these collections. Then, the alias will

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
Well, you have bad problem. You have a requirement that forces you to build an expensive, unreliable search system. You need to do specific shard creation at specific times every day. What happens if that fails? Does search go down until it is fixed because all searches are going to a shard

Re: Copy field a source of copy field

2017-07-20 Thread tstusr
Well, correct me if I'm wrong. Your suggestion is to use species field as a source of genus field. We try with this Where species work as described and genus just use a KWF, like this: But now, the problem now is different. When we try

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Erick Erickson
bq: that is our requirmnt to load data into specific shard and later after rentention time we will delete that shard Why is it necessary to delete a shard when deleting the old data by query removes it? This sounds like an XY problem. Someone has "required" that you enforce data retention by

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread rehman kahloon
blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Hi Eric,   Thank you very much for your guidance. No sir that is our requirmnt to load data into specific shard and later after

Re: Debug Queries field explaination

2017-07-20 Thread Charlie Hull
On 20/07/2017 11:41, Swapnil Pande wrote: Hi , Being an amateur in solr i wanted to learn how solr queries internally and how score is calculated. So setting debug=true. I get a json with fields like 'fromSetSize' , 'toSetSize'.. etc. Can I get a reference link to understand what these exactly

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Susheel Kumar
Agree. One should first try to measure the performance with standard/common approach. On Thu, Jul 20, 2017 at 11:00 AM, Walter Underwood wrote: > I agree. Use the standard shard distribution and delete by query to remove > older documents. > > Much, much simpler and

Re: Getting IO Exception while Indexing

2017-07-20 Thread Susheel Kumar
You can try to submit only the failed documents directly one by one/all and see if you get any error etc. On Thu, Jul 20, 2017 at 11:01 AM, Walter Underwood wrote: > If Apache is returning 400, then it really is a bad request. Debug the > request and fix it. > > wunder >

Re: Getting IO Exception while Indexing

2017-07-20 Thread Walter Underwood
If Apache is returning 400, then it really is a bad request. Debug the request and fix it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 19, 2017, at 11:27 PM, mesenthil1 > wrote: > > Hi, > This

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
I agree. Use the standard shard distribution and delete by query to remove older documents. Much, much simpler and probably faster at query time. I’m seeing a lot of e-mails about people trying to do fancy things with sharding before they’ve even tried and measured the performance. wunder

Re: Getting IO Exception while Indexing

2017-07-20 Thread mesenthil1
Hi, This is happening repeatedly for few documents. When we compared with other similar documents, we could not find any difference. As we are seeing 400 on apache, the request is not submitted to solr. So unable to find out the cause. Senthil -- View this message in context:

Debug Queries field explaination

2017-07-20 Thread Swapnil Pande
Hi , Being an amateur in solr i wanted to learn how solr queries internally and how score is calculated. So setting debug=true. I get a json with fields like 'fromSetSize' , 'toSetSize'.. etc. Can I get a reference link to understand what these exactly mean. Thanks.

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Erick Erickson
Use the "implicit" router (being renamed "manual". that takes the value of a particular field (_route_ by default) and sends docs to that exact shard. But I also question whether sharding on this schema is a good idea. If you have an access pattern where most queries are for, say, the last two

Re: Apache Solr 4.10.x - Collection Reload times out

2017-07-20 Thread Erick Erickson
1> are you replaying the tlog? If you have a large tlog for some reason you may be replaying it. Although a reload should do a commit first. 2> What do the Solr logs show the node in question to be doing? 3> Sorry to mislead you, async is not a 4.10 option for the RELOAD command so that was

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Erick Erickson
The key is removing the entire data directory as in "rm -rf solr_core/data" with Solr down then restarting Solr. Or create a new core. It's most probably working on Windows because the schema was set with multiVauled=false when you indexed your first document. Best, Erick On Thu, Jul 20, 2017

Re: default values for numRecordsToKeep and maxNumLogsToKeep

2017-07-20 Thread Erick Erickson
bq: I am pretty sure that anytime a core starts for *any* reason, all the transaction logs that are present will get replayed. This isn't quite true. If Solr is shut down gracefully, or a hard commit happened before shutdown (with no new docs added) then the tlog will _not_ be replayed on

Re: finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp
forgot the link with the statement: https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html On 20.07.2017 15:20, Hendrik Haddorp wrote: Hi, the Solr 6.6. ref guide states that to "finds all documents without a value for field" you can use: -field:[* TO *] While this is true

finds all documents without a value for field

2017-07-20 Thread Hendrik Haddorp
Hi, the Solr 6.6. ref guide states that to "finds all documents without a value for field" you can use: -field:[* TO *] While this is true I'm wondering why it is recommended to use a range query instead of simply: -field:* regards, Hendrik

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-20 Thread Markus Jelsma
cc mailinglist Hello, I thought that would come to your mind but do not worry, the heap averages at 55 % all day long, there is very little garbage collection going on, and if so, it is the eden space that gets collected. If you really want, i can send such a file when the problem occurs

Re: default values for numRecordsToKeep and maxNumLogsToKeep

2017-07-20 Thread Shawn Heisey
On 7/18/2017 11:53 AM, suresh pendap wrote: > After looking at the source code I see that the default values for > numRecordsToKeep is 100 and maxNumLogsToKeep is 10. > > So it seems by default the replica can only have 1000 document updates lag > before the replica goes for a Full recovery from

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am not running solr in cloud mode. On Thu, Jul 20, 2017 at 4:40 PM, Shawn Heisey-2 [via Lucene] < ml+s472066n4346954...@n3.nabble.com> wrote: > On 7/20/2017 2:30 AM, prashantas wrote: > > I am using solr6.4. In my managed-schema, I have defined my field > details. > > None of my fields are

Re: Apache Solr 4.10.x - Collection Reload times out

2017-07-20 Thread alessandro.benedetti
Additional information : Try single core reload I identified that an entire shard is not reloading ( while the other shard is ). Taking a look to the "not reloading" shard ( 2 replicas) , it seems that the core reload stucks here : org.apache.solr.core.SolrCores#waitAddPendingCoreOps The problem

Re: Solr Issue While indexing Data

2017-07-20 Thread rajat rastogi
Hi Shawn , I mailed you the info @ apa...@elyograg.org I can resend it the mail. regards Rajat On 20-Jul-2017, at 16:40, Shawn Heisey-2 [via Lucene] > wrote: On 7/20/2017 12:29 AM,

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Shawn Heisey
On 7/20/2017 2:30 AM, prashantas wrote: > I am using solr6.4. In my managed-schema, I have defined my field details. > None of my fields are multiValued. If I set property multiValued=false , it > works fine in Windows, but in CentOS/RHEL, it does not accept the same and > the field still shows

Re: Solr Issue While indexing Data

2017-07-20 Thread Shawn Heisey
On 7/20/2017 12:29 AM, rajat rastogi wrote: > I shared The code base, config , schema with you . Were they of any help , or > can You point what I am doing wrong in them . I did not see any schema or config. The top output shows that you have three large Java processes, all running as root.

Re: Boost by Integer value on top of query

2017-07-20 Thread Erik Hatcher
If you’re using edismax, adding a boost parameter `boost=num_employees=num_locations` should incorporate those integers into the scores. Just try one at a time at first - you’ll likely want to wrap it into a single function, along the lines of something like

Re: Apache Solr 4.10.x - Collection Reload times out

2017-07-20 Thread alessandro.benedetti
Taking a look to 4.10.2 source I may see why the async call does not work : /log.info("Reloading Collection : " + req.getParamString()); String name = req.getParams().required().get("name"); *ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION,

Boost by Integer value on top of query

2017-07-20 Thread marotosg
Hi, I have a use where I need to boost documents based on two integer values. Basically I need to retrieve companies using specific criteria like Company name, nationality etc. On top of that query I need to boost the most important ones which are suppose to be the ones with higher number of

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Amrit Sarkar
By saying: I am just adding multiValued=false in the managed-schema file. Are you modifying in the local filesystem "conf" or going into the core conf directory and changing there? If you are SolrCloud, you should change the same on Zookeeper.

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread alessandro.benedetti
Assuming the service solr service restart does its job, I think the only thing I would do is to completely remove the data directory content, instead of just running the delete query. Bare in mind that when you delete a document in Solr, this is marked as deleted, but it takes potentially a while

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-20 Thread wg85907
Hi Walter, Shawn, Thanks for your quickly reply, the information you provide is really helpful. Now I know how to find a right way to resolve my issue. Regards, Geng, Wei -- View this message in context:

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am just adding multiValued=false in the managed-schema file. Then deleting the complete data by running the command curl http://localhost:8983/solr/Schools/update?commit=true -d '*:*' where 'Schools' is my core name. Then restart the solr by "service solr restart" And then import the csv

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread alessandro.benedetti
I doubt it is an environment problem at all. How are you modifying your schema ? How you reloading your core/collection ? Are you restarting your Solr instance ? Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io --

Re: Apache Solr 4.10.x - Collection Reload times out

2017-07-20 Thread alessandro.benedetti
Thanks for the prompt response Erick, the reason that I am issuing a Collection reload is because I modify from time to the time the Solrconfig for example, with different spellcheck and request parameter default params. So after the upload to Zookeeper I reload the collection to reflect the

multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am using solr6.4. In my managed-schema, I have defined my field details. None of my fields are multiValued. If I set property multiValued=false , it works fine in Windows, but in CentOS/RHEL, it does not accept the same and the field still shows multiValued true in my solr admin UI. Please help

Need guidance solrcloud shardings with date interval

2017-07-20 Thread rehman kahloon
Hi Sir,             Taken your id from your document on SlideShare. Need your guidance on my plan ,My target is to create sub-collection/shards within a collection. e.g          Currently 1 have 10 days data and want to store data against each date in separate partitions.  like oracle partition

Re: Solr Issue While indexing Data

2017-07-20 Thread rajat rastogi
Hi Shawn , I shared The code base, config , schema with you . Were they of any help , or can You point what I am doing wrong in them . regards Rajat On 19-Jul-2017, at 21:41, Shawn Heisey-2 [via Lucene] > wrote: