Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Thanks Shawn! Can any of the committers comment about the CDCR error that I posted above? Thanks Jay On Fri, Oct 25, 2019 at 2:56 PM Shawn Heisey wrote: > On 10/25/2019 3:22 PM, Jay Potharaju wrote: > > Is there a solr slack channel? > > People with @apache.org email addresses can readily

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread Shawn Heisey
On 10/25/2019 2:30 PM, rhys J wrote: So I went back to one of the fields that is multi-valued, which I explicitly did not choose when I created the field, and I re-created it. It still made the field multi-valued as true. Why is this? Did you reload the core/collection or restart Solr so the

Re: cdcr replicator NPE errors

2019-10-25 Thread Shawn Heisey
On 10/25/2019 3:22 PM, Jay Potharaju wrote: Is there a solr slack channel? People with @apache.org email addresses can readily join the ASF workspace, I do not know whether it is possible for others. That workspace might be only for ASF members. https://the-asf.slack.com In that

Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Is there a solr slack channel? Thanks Jay Potharaju On Fri, Oct 25, 2019 at 9:00 AM Jay Potharaju wrote: > Hi, > I am frequently seeing cdcr-replicator null pointer exception errors in > the logs. > Any suggestions on how to address this? > *Solr version: 7.7.2* > > ExecutorUtil > Uncaught

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
> > > > "dl2":["Great Plains"], > > "do_not_call":false, > > There are no hashes inside the document. If there were, they would be > surrounded by {} characters. The whole document is a hash, which is why > it has {} characters. Referring to the snippet that I included above,

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread Shawn Heisey
On 10/25/2019 1:48 PM, rhys J wrote: Is there some reason that text_general fields are returned as arrays, and other fields are returned as hashes in the json response from a curl query? Here's the response: "dl2":["Great Plains"], "do_not_call":false, There are no

Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
Is there some reason that text_general fields are returned as arrays, and other fields are returned as hashes in the json response from a curl query? Here's my curl query: curl "http://10.40.10.14:8983/solr/dbtr/select?indent=on=debtor_id:393291; Here's the response:

Re: POS Tagger

2019-10-25 Thread Nicolas Paris
Also the openNlp solr POS tagger [1] uses the typeAsSynonymFilter to store the POS: " Index the POS for each token as a synonym, after prefixing the POS with @ " Not sure how to deal with POS after such indexing, but this looks interesting approach ? [1]

Re: regarding Extracting text from Images

2019-10-25 Thread Eric Pugh
Just to stir the pot on this topic, here is an article about why and how to use Tika inside of Solr: https://opensourceconnections.com/blog/2019/10/24/it-s-okay-to-run-tika-inside-of-solr-if-and-only-if/ > On Oct 23, 2019, at 7:21 PM, Erick Erickson wrote: > > Here’s a blog about why and how

Re: POS Tagger

2019-10-25 Thread Dave
Yeah. My mistake in explanation. But it really does help with better relevance in the returned documents > On Oct 25, 2019, at 12:39 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com > wrote: > > Oh I see I see > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM >

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh I see I see -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:21 PM, "David Hastings" wrote: oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
How can a field itself be tagged with a part of speech? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:12 PM, "David Hastings" wrote: nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at

Re: POS Tagger

2019-10-25 Thread Nicolas Paris
> Do you use the POS tagger at query time, or just at index time? I have the POS tagger pipeline ready but nothing done yet on the solr part. Right now I am wondering how to use it but still looking for relevant implementation. I guess having the POS information ready before indexation gives

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush limbaugh' would come back with results where he is an entity higher than if it was two words in a sentence On Fri, Oct 25, 2019 at 12:12 PM David Hastings <

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at 12:11 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > So then you do run your POS tagger at query-time, Dave? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM >

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
So then you do run your POS tagger at query-time, Dave? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:06 PM, "David Hastings" wrote: I use them for query boosting, so if someone searches for: i dont want to rush limbaugh out the

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Nicolas, Do you use the POS tagger at query time, or just at index time? We are thinking of using it to filter the tokens we will eventually perform ML on. Basically, we have a bunch of acronyms in our corpus. However, many departments use the same acronyms but expand those acronyms to

Re: POS Tagger

2019-10-25 Thread David Hastings
I use them for query boosting, so if someone searches for: i dont want to rush limbaugh out the door vs i talked to rush limbaugh through the door my documents where 'rush limbaugh' is a known entity (noun) and a person (look at the sentence, its obviously a person and the nlp finds that) have

cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Hi, I am frequently seeing cdcr-replicator null pointer exception errors in the logs. Any suggestions on how to address this? *Solr version: 7.7.2* ExecutorUtil Uncaught exception java.lang.NullPointerException thrown by thread: cdcr-replicator-773-thread-3 java.lang.Exception: Submitter stack

Re: POS Tagger

2019-10-25 Thread Nicolas Paris
Also we are using stanford POS tagger for french. The processing time is mitigated by the spark-corenlp package which distribute the process over multiple node. Also I am interesting in the way you use POS information within solr queries, or solr fields. Thanks, On Fri, Oct 25, 2019 at

Dynamic facet limits using Solr

2019-10-25 Thread mohamedXYZ
How can I group my Solr query results using a numeric field into x buckets, where the bucket start and end values are determined when the query is run? For example, if I want to count and group documents into 5 buckets by a wordCount field, the results should be: 250-500 words: 3438 results

Dynamic facet limits using Solr

2019-10-25 Thread mohamedXYZ
How can I group my Solr query results using a numeric field into x buckets, where the bucket start and end values are determined when the query is run? For example, if I want to count and group documents into 5 buckets by a wordCount field, the results should be: 250-500 words: 3438 results

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
ah, yeah its not the fastest but it proved to be the best for my purposes, I use it to pre-process data before indexing, to apply more metadata to the documents in a separate field(s) On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > No, I meant for

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
No, I meant for part-of-speech tagging __ But that's interesting that you use StanfordNLP. I've read that it's very slow, so we are concerned that it might not work for us at query-time. Do you use it at query-time, or just index-time? -- Audrey Lorberfeld Data Scientist, w3 Search IBM

Re: POS Tagger

2019-10-25 Thread David Hastings
https://nlp.stanford.edu/ On Fri, Oct 25, 2019 at 10:29 AM David Hastings < hastings.recurs...@gmail.com> wrote: > Do you mean for entity extraction? > I make a LOT of use from the stanford nlp project, and get out the > entities and use them for different purposes in solr > -Dave > > On Fri,

Re: POS Tagger

2019-10-25 Thread David Hastings
Do you mean for entity extraction? I make a LOT of use from the stanford nlp project, and get out the entities and use them for different purposes in solr -Dave On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > Does anyone use a POS tagger with

POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, Does anyone use a POS tagger with their Solr instance other than OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson. Thanks! -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Erick Erickson
I’m also surpised that you see a slowdown, it’s worth investigating. Let’s take the NRT case with only a leader. I’ve seen the NRT indexing time increase when even a single follower was added (30-40% in this case). We believed that the issue was the time the leader sat waiting around for the

Re: solr-user-subscribe

2019-10-25 Thread Erick Erickson
If you _are_ using SolrCloud, you can use the collections API SPLITSHARD command. > On Oct 25, 2019, at 7:37 AM, Shawn Heisey wrote: > > On 10/24/2019 11:19 PM, Hafiz Muhammad Shafiq wrote: >> HI, >> I am using Solr 6.x version for search purposes. Now data has been >> increased into one

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Ere Maijala
Shawn Heisey kirjoitti 25.10.2019 klo 14.54: > With newer Solr versions, you can ask SolrCloud to prefer PULL replicas > for querying, so queries will be targeted to those replicas, unless they > all go down, in which case it will go to non-preferred replica types.  I > do not know how to do this,

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Shawn, So, I understand that while non leader TLOG is copying the index from leader, the leader stop indexing. One shot large heavy bulk indexing should be very much more impacted than continus ligth indexing. Regards. Dominique Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > On

Re: solr configuration issue

2019-10-25 Thread Shawn Heisey
On 10/25/2019 5:44 AM, Danilo Tomasoni wrote: Another question, is softCommit sufficient to ensure visibility or should I call a commit to ensure a new searcher will be opened? softCommit automatically opens a new searcher? There would be little point to doing a soft commit with openSearcher

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Shawn Heisey
On 10/25/2019 1:16 AM, Dominique Bejean wrote: For collection created with all replicas as NRT * Indexing time : 22 minutes For collection created with all replicas as TLOG * Indexing time : 34 minutes NRT indexes simultaneously on all replicas. So when indexing is done on one, it is

Re: solr configuration issue

2019-10-25 Thread Danilo Tomasoni
Thank you all for your suggestions. Now I changed my import strategy to ensure that the same document will be updated eventually by different "batches", in this way I need a single programmatic softcommit at the end of each batch. Configuration-side I enabled autoCommit with

Re: Query on changing FieldType

2019-10-25 Thread Shubham Goswami
Hello Erick/Emir Thanks for your valuable suggestions. I will it keep in mind while doing such operations. Best, Shubham On Wed, Oct 23, 2019 at 5:56 PM Erick Erickson wrote: > Really, just don’t do this. Please. As others have pointed out, it may > look like it works, but it won’t. I’ve

Re: solr-user-subscribe

2019-10-25 Thread Shawn Heisey
On 10/24/2019 11:19 PM, Hafiz Muhammad Shafiq wrote: HI, I am using Solr 6.x version for search purposes. Now data has been increased into one shard. I have to create some additional shards and also have to balance base on number of documents. According to my search, solr does not provide

Re: solr 8.1.1 many time slower returning query results than solr 4.10.4 or solr 6.5.1

2019-10-25 Thread Vincenzo D'Amore
Hi Russell, I've noticed few differences between solr8 schema and solr6. Few omitNorms params missing and few solr.FlattenGraphFilterFactory missing too. But perhaps the most important difference between the 6 and 8 is the memory configuration. solr 6 has SOLR_HEAP="27158m"

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi Jörn , I am using version 8.2. I repeated the test twice for each mode. I restarted solr nodes and deleted / created empty collection each time. Regards. Dominique Le ven. 25 oct. 2019 à 09:20, Jörn Franke a écrit : > Which Solr version are you using and how often you repeated the test?

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Jörn Franke
Which Solr version are you using and how often you repeated the test? > Am 25.10.2019 um 09:16 schrieb Dominique Bejean : > > Hi, > > I made some benchmarks for bulk indexing in order to compare performances > and ressources usage for NRT versus TLOG replica. > > Environnent : > * Solrcloud

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi, I made some benchmarks for bulk indexing in order to compare performances and ressources usage for NRT versus TLOG replica. Environnent : * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap) * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG) * 1 core per Solr Server Indexing of

Re: solr-user-subscribe

2019-10-25 Thread Hafiz Muhammad Shafiq
HI, I am using Solr 6.x version for search purposes. Now data has been increased into one shard. I have to create some additional shards and also have to balance base on number of documents. According to my search, solr does not provide rebalance API. Is it correct ? How can I do my job. On Fri,