Re: partial update in solr

2018-10-30 Thread Zahra Aminolroaya
Alex I use solr 7. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr cloud - poweroff procedure

2018-10-30 Thread Walter Underwood
I agree. 1. Shut down each Solr server process using the “bin/solr” script. 2. Shut down the Zookeeper ensemble. 3. Take backups. 4. Shut down the OS. Do that in reverse to get going. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 30, 2018, at

Re: Solr cloud - poweroff procedure

2018-10-30 Thread Erick Erickson
bin/solr stop As long as you don't kill it with extreme prejudice (i.e. kill -9 or pull the plug) it should be fine. Assuming you're running ZooKeeper in an external ensemble, I'd certainly stop those after all the Solr instances were stopped. Powering the nodes up is irrelevant to Solr, the

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland
Thanks Doug. It is funny that you should mention that. It is very hard trying to convince people that just because words are somehow related, we really don't know how they are related. This is especially true when they are handed the results of a shallow neural net that took a research team a few

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Doug Turnbull
You may already know this, but just be very careful. Embeddings are useful, but people often think of them as detecting synonyms, but really just encode contexts. For example antonyms and words with similar functions often are seen as similar. There's also issues with terms that occur in sparsely

Solr cloud - poweroff procedure

2018-10-30 Thread lstusr 5u93n4
Hi All, We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and the data director on hdfs. It has 950 million documents in the index, occupying 700GB of disk space. We need to completely power off the system to move it. Are there any actions we should take on shutdown to help

RE: Odd Scoring behavior

2018-10-30 Thread Markus Jelsma
Hello Webster, It smells like KeywordRepeat. In general it is not a problem if all terms are scored twice. But you also have RemoveDuplicates, and this causes that in some cases a term in one field is scored twice, but once in the other field and then you have a problem. Due to lack of

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland
Oh very cool. I will have to look into this more. This is something up and coming I take it? Thanks, ~Ben On Tue, Oct 30, 2018 at 4:36 PM Alexandre Rafalovitch wrote: > Simon Hughes presentation on just finished Activate may be relevant: > >

Odd Scoring behavior

2018-10-30 Thread Webster Homer
I noticed that sometimes query matches seem to get counted twice when they are scored. This will happen if the fieldtype is being stemmed, and there is a matching synonym. It seems that the score for the field is 2X higher than it should be. We see this only when there is a matching synonym

RE: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Phil Scadden
I will second the SolrJ method. You don’t want to be doing this on your SOLR instance. One question is whether your PDFs are scanned or are already searchable. I use tesseract offline to convert all scanned PDFs into searchable PDF so I don’t want Tika to be doing that. My code core is:

RE: Merging data from different sources

2018-10-30 Thread Markus Jelsma
Hello Martin, We also use an URP for this in some cases. We index documents to some collection, the URP reads a field from that document which is an ID in another collection. So we fetch that remote Solr document on-the-fly, and use those fields to enrich the incoming document. It is very

RE: Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)
Hi Alex, Thanks for your help. I will take a look at the update-request-processor. I wonder if there is a way to link documents together, so that they always show up together should one of the documents match a search query? -Original Message- From: Alexandre Rafalovitch Sent: 30.

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Alexandre Rafalovitch
Simon Hughes presentation on just finished Activate may be relevant: https://www.slideshare.net/SimonHughes13/vectors-in-search-towards-more-semantic-matching The video will be available in a couple of weeks, I am guessing from LucidWorks channel. Related repos: *)

Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland
Hello all, We came up with a fascinating question. We actually have for our corpora, word2vec, doc2vec, and GloVe results. Is it possible to use these datasets within the search engine? If so, could you please point me to documentation on how to get Solr to use them? Thank you so much, ~Ben

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Shawn Heisey
On 10/29/2018 7:24 AM, Sofiya Strochyk wrote: Actually the smallest server doesn't look bad in terms of performance, it has been consistently better that the other ones (without replication) which seems a bit strange (it should be about the same or slightly worse, right?). I guess the memory

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread ☼ R Nair
I have done a production implementation of this, running for last four months without any issue. Just a resatrt every week of all components. http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ Best, Ravion On Tue, Oct 30, 2018, 1:00 PM

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Erick Erickson
All of the above work, but for robust production situations you'll want to consider a SolrJ client, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog combines indexing from a DB and using Tika, but those are independent. Best, Erick On Tue, Oct 30, 2018 at 12:21 AM Kamuela

Re: Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Erick Erickson
Chris: Please follow the instructions here: http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must use the _exact_ same e-mail as you used to subscribe. If the initial try doesn't work and following the suggestions at the "problems" link doesn't work for you, let us know. But

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Sofiya Strochyk
Sure, here is IO for bigger machine: https://upload.cc/i1/2018/10/30/tQovyM.png for smaller machine: https://upload.cc/i1/2018/10/30/cP8DxU.png CPU utilization including iowait: https://upload.cc/i1/2018/10/30/eSs1YT.png iowait only: https://upload.cc/i1/2018/10/30/CHgx41.png On 30.10.18

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel
Please see inline... Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree,

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Shawn Heisey
On 10/29/2018 8:56 PM, Erick Erickson wrote: The interval between when a commit happens and all the autowarm queries are finished if 52 seconds for the filterCache. seen warming that that long unless something's very unusual. I'd actually be very surprised if you're really only firing 64

Re: Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Chris Gerke
UNSUBSCRIBE On Tue, 30 Oct 2018 at 8:24 pm, Stefan Kuhn wrote: > Hi, > > last week I found an error in the result sorting regarding a field of the > type "solr.CurrencyFieldType" in solr version 7.3.1. > > There are multiple documents which I must sort with this field, but the > order of the

Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Stefan Kuhn
Hi, last week I found an error in the result sorting regarding a field of the type "solr.CurrencyFieldType" in solr version 7.3.1. There are multiple documents which I must sort with this field, but the order of the result is apparently not correctly sorted after the sorting parameters

Re: Merging data from different sources

2018-10-30 Thread Alexandre Rafalovitch
Maybe https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory Regards, Alex On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote: > Hi, > > I am trying to merge files from different sources and with different > content (except for one

Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to merge files from different sources and with different content (except for one key-field) , how can this be done in Solr? An example could be: Document 1 001 Unique id for Document 1 test-123 …

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Sofiya Strochyk
My swappiness is set to 10, swap is almost not used (used space is on scale of a few MB) and there is no swap IO. There is disk IO like this, though: https://upload.cc/i1/2018/10/30/43lGfj.png https://upload.cc/i1/2018/10/30/T3u9oY.png However CPU iowait is still zero, so not sure if the disk

Re: TLOG replica stucks

2018-10-30 Thread Ere Maijala
Hi, We had the same happen with PULL replicas with Solr 7.5. Solr was showing that they all had correct index version, but the changes were not showing. Unfortunately the solr.log size was too small to catch any issues, so I've now increased and waiting for it to happen again. Regards, Ere

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel
Yes. Swapping from disk to memory & vice versa Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn:

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Kamuela Lau
Hi there, Here are a couple of ways I'm aware of: 1. Extract-handler / post tool You can use the curl command with the extract handler or bin/post to upload a single document. Reference: https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html 2.

Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread adiyaksa kevin
Hello there, let me introduce my self. My name is Mohammad Kevin Putra (you can call me Kevin), from Indonesia, i am a beginner in backend developer, i use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0. I have a little bit problem about how to put PDF File via Apache TIKA. I