RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote: > > Popularity has a very wide range. Try my example, scale 1 million and 100 > into the same 1.0-0.0 range. Even with log popularity. Well, in our case, we don't really care do differentiate between documents with low popularity.

Re: how to update billions of docs

2016-03-19 Thread Toke Eskildsen
Mohsin Beg Beg wrote: > I have a requirement to replace a value of a field in 100B's of docs > in 100's of cores. The field is multiValued=false docValues=true > type=StrField stored=true indexed=true. If this is just a simple one-time search-replace, then don't update the

Solr:Skip document from indexing when it matches specific value

2016-03-19 Thread solr2020
Hi, How we can ignore a document from indexing into solr when a field matches particular value. Eg. we would like to ignore a document from indexing when document's field path matches value "/content". Do we have any OOTB processors to accomplish this in solr. Thanks. -- View this message in

Re: Explain style json? Without using wt=json...

2016-03-19 Thread Chris Hostetter
: We are using Solrj to query our solr server, and it works great. : However, it uses the binary format wt=javabin, and now when I'm trying : to get better debug output, I notice a problem with this. The thing is, : I want to include the explain data for each search result, by adding :

Re: Solr 4.10 Suggestor

2016-03-19 Thread Erick Erickson
The log files will have messages, but nothing that I know of programmatically. Solr won't accept any requests if it's building on startup until the build is done though. And prior to 5.1 specifying the buildOnStartup=false was ignored. See SOLR-6679. That JIRA just took the suggester out of

RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Just wondering if my observation of SolrCloud behavior after ZooKeeper loses a quorum is normal or to-be-expected Version of Solr: 5.3.1 Version of ZooKeeper: 3.4.7 Using SolrCloud with external ZooKeeper Deployed on AWS Our Solr cluster has 3 nodes Our Zookeeper ensemble consists of three

Re: Stopping Solr JVM on OOM

2016-03-19 Thread xavi jmlucjav
In order to force a OOM do this: - index a sizable amount of docs with normal -Xmx, if you already have 350k docs indexed, that should be enough - now, stop solr and decrease memory, like -Xmx=15m, start it, and run a query with a facet on a field with very high cardinality, ask for all facets.

Re: High Cpu sys usage

2016-03-19 Thread Patrick Plaatje
Yeah, I did’t pay attention to the cached memory at all, my bad! I remember running into a similar situation a couple of years ago, one of the things to investigate our memory profile was to produce a full heap dump and manually analyse that using a tool like MAT. Cheers, -patrick On

Re: Connection refused: no further information

2016-03-19 Thread Shawn Heisey
On 3/16/2016 6:29 AM, manohar wrote: >I am getting error in windows server after starting the zookeeper server > , i entered this command > "solr start -cloud -p 8983 -s C:\solr\server\solr\node1\solr -z > 16.254.6.88:2181" .Then i got this error > > Waiting up to 30 to see Solr running on

Re: Solrj , how to create collection

2016-03-19 Thread Anshum Gupta
Are you running Solr in Cloud (ZooKeeper aware) mode ? If so, manual creation of core is actually not something that is supported. It works, but it's not supported. Assuming you _are_ running in cloud mode, the answer to your question is yes. Provided you upload the configuration to be used by

Solr 5.5 error at startup - ClassNotFoundException: org.simpleframework.xml.core.Persister

2016-03-19 Thread Shamik Bandopadhyay
Hi, I'm getting error while starting up Solr 5.5 . I'm using jdk and CentOS 6.7. I'm setting up Solr for the first time, booting up with 2 shards. I've enabled the following entry in /bin/solr if [ -e "$SOLR_HOME/knowledge/core.properties" ]; then

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Hi, Your cache will be cleared on soft commits - every two minutes. It seems that it is either configured to be huge or you have big documents and retrieving all fields or dont have lazy field loading set to true. Can you please share your document cache config and heap settings. Thanks,

RE: Solr 4.10 Suggestor

2016-03-19 Thread Matt Kuiper
Thanks Erick! After I posted, I did wonder if Solr would be available prior to the build completing. Yes, soon looking to move to a different approach (ngrams), even though currently the corpus is small. Matt -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com]

Re: Solrj , how to create collection

2016-03-19 Thread Shawn Heisey
On 3/19/2016 5:44 PM, Iana Bondarska wrote: > Could you please tell me, is it possible to create new collection on solr > server only using solrj,without manual creation of core folder on server. > I'm using solrj v.5.5.0,standalone client. If the server is running in cloud mode (with zookeeper)

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:14 AM, Tom Evans wrote: > The problem occurs when we attempt to query a node to see if products > or items is active on that node. The balancer (haproxy) requests the > ping handler for the appropriate collection, however all the nodes > return OK for all the collections(!) > > Eg,

Indexing both meta-data and full content of HTML

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
I have some XML that includes a stylesheet maintained by another organization that renders to HTML. The HTML is pretty good - it is not "structured" in RDFa or schema.org, but has classes and anchors that can be used to find some key data. So, I can probably get all the meta-data I want

Re: Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-19 Thread Erick Erickson
Likely you have some old jars in the classpath somehow. The first parts of the log should show you exactly what jars are loaded. It's tedious to go through since there are a lot of them, but it's something to check. If you have the hardware, try putting it on a machine that's never had Solr on

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Erick Erickson
Yes, there is one and only one tokenizer allowed. Best, Erick On Wed, Mar 16, 2016 at 7:51 PM, Zheng Lin Edwin Yeo wrote: > Thanks Shawn for your reply. > > Yes, I'm looking to see if we can implement a combination of tokenizes and > filters. > > However, I tried before

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat up memory before new search is executed.

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Alessandro Benedetti
Actually if you are able to collect past ( or future signals) like clicks or purchase, i would rather focus on the features of your products rather than the products themselves. What will happen is that you are going to be able rank in a better way products based on how their feature should affect

Re: indexing Free-form text description

2016-03-19 Thread Vis Sw
Thanks a lot Erick and Alex... I am going through the documents and blogs... thanks for the pointers. Here is what I tried starting with "text_general"... a) Looks like it breaks on whitespace for e.g. for project_collaborator values as "myproject122_USC Dan Forrester ", "myproject123_USC

Re: Solr:Skip document from indexing when it matches specific value

2016-03-19 Thread Shawn Heisey
On 3/16/2016 5:36 AM, solr2020 wrote: > How we can ignore a document from indexing into solr when a field matches > particular value. > Eg. we would like to ignore a document from indexing when document's field > path matches value "/content". Do we have any OOTB processors to accomplish > this in

Solr 4.10 Suggestor

2016-03-19 Thread Matt Kuiper
All, Using the Suggestor component and running Solr 4.10. I have read that on Solr startup (or commit, depending on config) the building of the Suggestor can be CPU intensive and take some time. Does anyone know how to determine that the Suggestor has completed it's build? Something to look

Solrj , how to create collection

2016-03-19 Thread Iana Bondarska
Hi, Could you please tell me, is it possible to create new collection on solr server only using solrj,without manual creation of core folder on server. I'm using solrj v.5.5.0,standalone client. Thanks, Iana

Re: High Cpu sys usage

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:27 PM, YouPeng Yang wrote: > Hi Shawn >Here is my top screenshot: > >https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0 > >It is captured when my system is normal.And I have reduced the memory > size down to 48GB originating from 64GB. It looks like you have

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Shawn Heisey
On 3/16/2016 10:11 AM, Tom Evans wrote: > This worked, I would still be interested in a lighter-weight approach > that doesn't involve joins to see if a given collection has a shard on > this server. I suspect that might require a custom ping handler plugin > however. If you are doing joins, then

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Alessandro Benedetti
In a relevancy problem I would repeat what my colleagues already pointed out : Data is key. We need to understand first of all our data before we can understand what is relevant and what is not. Once we specify a groundfloor which make sense ( and your basic approach + proper schema configuration

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote: > > That works fine if you have a query that matches things with a wide range of > popularities. But that is the easy case. > > What about the query "twilight", which matches all the Twilight movies, all > of which are popular

[ANNOUNCEMENT] Luke 5.5.0 released

2016-03-19 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.5.0 Fixed in this release: #50 (Literally, the upgrade to Lucene 5.5.0) Enjoy! -- Dmitry Kan Luke

Re: publish solr on galsshfish server

2016-03-19 Thread Upayavira
This is not recommended. It may work, and if it does, a future update to Solr may stop it working, without warning. Solr is to be considered its own app, to be run using its own embedded servlet container, as this allows the project to manage its own configuration and to test thoroughly that it

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Hi Rajesh, I've been seeing the same problem you have. My debug scores seem to be what I expect, but the actual scores applied by Solr are sometimes divided by an integer. I raised the same question in this email distribution about a week ago, but haven't yet found a solution. There's also a

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Erick Erickson
I think you're mixing up schema and config? The message about not hand-modifying is for schema.xml (well, managed-schema). To lock it down you need to modify solrconfig.xml... There shouldn't need to be any need to unload, just reload? And I just skipped the e-mail so maybe I'm way off base.

Is there any JIRA changed the stored order of multivalued field?

2016-03-19 Thread forest_soup
We have a field named "attachmentnames": We do POST to add data to Solr v4.7 and Solr v5.3.2 respectively. The attachmentnames are in 789, 456, 123 sequence: { "add": { "overwrite": true, "doc": { "id":"1",

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks Scott and John, As luck would have it I've got a PhD graduate coming for an interview today, who just happened to do her research thesis on information retrieval with quantum theory and machine learning :) John, it sounds like you're describing my system! Shopping products from

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rick and Rajesh, I wasn't able re-produce this neither with lucene nor solr. What version of solr is this? Are you using a sharded request? @BeforeClass public static void beforeClass() throws Exception { initCore("solrconfig.xml", "schema.xml"); assertU(adoc("id", "1722669", "title", "Lync

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread Pradeep Chandra
Hi Sir, I downloaded the file from http://search.maven.org/#artifactdetails%7Ccom.vividsolutions%7Cjts-core%7C1.14.0%7Cjar as you said in my previous post. Then I copied the .jar file into the server/lib directory...That is the thing only I did. At the first time I tried with small polygons.

Re: using solr AnalyticsQuery API vs facet API

2016-03-19 Thread Joel Bernstein
https://issues.apache.org/jira/browse/SOLR-8492 shows an example of the AnalyticsQuery where the merge is being handled by the Streaming API. I actually think this is nicer then then using MergeStrategy. The Streaming API gives you full control over the merge from the shards. Joel Bernstein

Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-19 Thread Victor D'agostino
H guys I have a java.lang.ClassNotFoundException: solr.MockTokenizerFactory after a fresh 5.5.0 setup with DIH and a collection named "db". The tgz file is from http://apache.crihan.fr/dist/lucene/solr/5.5.0/solr-5.5.0.tgz Any idea why this class is missing at startup ? Should i download

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Scott Stults
You're not going to be able to look at field boosts by themselves to judge relevancy because it's very much a data-driven optimization problem. For example, if you only sell iPhone cases but no iPhones, a search for "black iphone" should show a bunch of black iPhone cases at the top of the

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 7:51 PM, Jay Potharaju wrote: > Does using schema API mean that no upconfig to zookeeper and no reloading > of all the nodes in my solrcloud? In which scenario should I not use schema > API, if any? The documentation says that a reload occurs automatically after the schema

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Thanks Erick, I have another index with the same infrastructure setup, but only 10m docs, and never see these slow-downs, that's why my first instinct was to look at creating more shards. I'll definitely make a point of investigating further tho with all the things you and Shawn mentioned,

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Erick Erickson
Be _very_ cautious when you're looking at these timings. Random spikes are often due to opening a new searcher (assuming you're indexing as you query) and are eminently tunable by autowarming. Obviously you can't fire the same query again and again, but if you collect a set of "bad" queries and,

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Alexandre Rafalovitch
Daniel, Thank you for the very concrete example. That is helpful. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 17 March 2016 at 08:17, Davis, Daniel (NIH/NLM) [C] wrote: > Alexandre, > > I just made

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Jan Høydahl
You can also use functions to “compress” the source number, so that the effect of a certain boost becomes bigger or smaller compared to the other boost you have. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 17. mar. 2016 kl. 23.21 skrev Upayavira

RE: Making managed schema unmutable correctly?

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
Alexandre, I just made this transition, both to SolrCloud and to managed schema. In QA and Production, you update solrconfig.xml to say the schema is not mutable: true managed-schema My workflow in development is as follows: - Start with gettingstarted configuration and

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
That's what I thought you had meant before, but the Jira ticket indicates that you are looking for some extra level of AND/MUST outside of the OR, which is different from what you just indicated. In the ticket you say: "How can I achieve following? "+((fl:java fl:book))"", which has an extra AND

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread David Smiley
JTS doesn't has any vertex limit on the geometries. So I don't know why your query isn't working. On Wed, Mar 16, 2016 at 1:58 AM Pradeep Chandra < pradeepchandra@gmail.com> wrote: > Hi Sir, > > Let me give some clarification on IsWithin(POLYGON(())) query...It is not > giving any result

Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by "+((fl:java fl:book))" is any of the terms should be present in the complete query. Please correct me if I am wrong. What I want to achieve is (A OR B) where any of the term or both of the term will cause a match. Thanks, Modassar On Thu, Mar 17, 2016 at 10:32 AM, Jack

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
I was just wanting to see the Jira clarified (without creating noise on the Jira), but if others feel they understand the relevance of the outer AND/+ to the stated problem, fine. I don't think I have anything else to add to the discussion at this stage. Now we sit and wait for some senior

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi It happened again,and worse thing is that my system went to crash.we can even not connect to it with ssh. I use the sar command to capture the statistics information about it.Here are my details: [1]cpu(by using sar -u),we have to restart our system just as the red font LINUX RESTART in the

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
That does sound rather useful! We currently have it set to 0.1 On 03/18/2016 04:13 PM, Nick Vasilyev wrote: Tie does quite a bit, without it only the highest weighted field that has the term will be included in relevance score. Tie let's you include the other fields that match as well. On

Explain style json? Without using wt=json...

2016-03-19 Thread jimi.hullegard
Hi, We are using Solrj to query our solr server, and it works great. However, it uses the binary format wt=javabin, and now when I'm trying to get better debug output, I notice a problem with this. The thing is, I want to include the explain data for each search result, by adding "[explain]"

DIG issue with SolrEntityProcessor 5.4.1

2016-03-19 Thread William Bell
We are running this inside of another entity in DIH. There appears to be an issue. We get 2 calls to the survey core if hits > 0. If hits = 0 we get 1 call. Has anyone else seen this? Shall I fix it? Any ideas where this bug may be? http://localhost:8983/solr/survey; qt="dihsurvey"

Re: indexing pdf files using post tool

2016-03-19 Thread Francisco Andrés Fernández
Vidya, I don't know if I'm understanding it very well but, I think that the best way is to parse your text using a routine outside Solr. You might need to map the different parts of your document using your domain knowledge and use such routine to produce an XML document for example, with

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
Now you've confused me... Did you actually intend that q.op=AND was going to perform some function in a query with only two terms and and OR operator? I mean, why not just drop the q.op=AND? -- Jack Krupansky On Wed, Mar 16, 2016 at 1:31 AM, Modassar Ather wrote: > Jack

Re: Solr Wiki - Request to add to contributors group

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:57 AM, Alessandro Benedetti wrote: > Shawn, thank you very much ! > So, I didn't have an account in the old wiki, can you add me as contributor > ? > Just created. > I will then proceed adding the classification documentation. > > AlessandroBenedetti The username that I added

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rajesh, I suspect it is due to the queryNorm(q). But it is weird that relative order is different in your example. "queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Shawn Heisey
On 3/19/2016 11:12 AM, Robert Brown wrote: > I have an index of 60m docs split across 2 shards (each with a replica). > > When load testing queries (picking random keywords I know exist), and > randomly requesting facets too, 95% of my responses are under 0.5s. > > However, during some random

Re: Document Cache

2016-03-19 Thread Rallavagu
comments in line... On 3/17/16 2:16 PM, Erick Erickson wrote: First, I want to make sure when you say "TTL", you're talking about documents being evicted from the documentCache and not the "Time To Live" option whereby documents are removed completely from the index. May be TTL was not the

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread Shawn Heisey
On 3/19/2016 7:11 AM, GW wrote: > I think the easiest way to write apps for Solr is with some kind of > programming language and the REST API. Don't bother with the PHP or Perl > modules. They are deprecated and beyond useless. just use the HTTP call > that you see in Solr Admin. Mind the URL

Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Hi, I have an index of 60m docs split across 2 shards (each with a replica). When load testing queries (picking random keywords I know exist), and randomly requesting facets too, 95% of my responses are under 0.5s. However, during some random manual tests, sometimes I see searches taking

Re: No live SolrServers available to handle this request

2016-03-19 Thread Shawn Heisey
On 3/18/2016 9:55 PM, Anil wrote: > Thanks for your response. > CDH is a Cloudera (third party) distribution. is there any to get the > notifications copy of it when cluster state changed ? in logs ? > > I can assume that the exception is result of no availability of replicas > only. Agree? Yes,

[nested] how to specify a path for multiple nesting?

2016-03-19 Thread Alisa Z .
Hi all, I have a deeply multi-level data structure (up to 6-7 levels deep) where due to the nature of the data some nested documents can have same type names at various levels. How to form a proper query on a nested field that would contain "a path"  that defines that field? I'll clarify

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi Shawn Here is my top screenshot: https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0 It is captured when my system is normal.And I have reduced the memory size down to 48GB originating from 64GB. We have two hardware clusters ,each is comprised of 3 machines,and On one

Error starting solr 5.5 - Cannot open solr.log:No such file or directory

2016-03-19 Thread Shamik Bandopadhyay
Hi, I'm trying to upgrade from Solr 5.0 to 5.5. I'm getting the following error: tail: cannot open `/mnt/ebs2/solrhome/logs/solr.log' for reading: No such file or directory I'm running on CentOS 6.7. The same startup script has been working fine for 5.0 till now. I'm executing as user "solr".

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Walter Underwood
Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn’t work. With multiplicative boost, we only care about the difference between the

Re: indexing Free-form text description

2016-03-19 Thread Alexandre Rafalovitch
Well, Solr ships with nearly 10 examples. So, if you go through them, you will know quite a lot. This article (mine) may help you to navigate them: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ More specifically, as Erick said, your question is too generic. One step forward

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
I found that in WordDelimiterFilterFactory, there is a parameter called splitOnNumerics, which does the same function as what HMMChineseTokenizer did. - *splitOnNumerics="1"* causes alphabet => number transitions to generate a new part [Solr 1.3]: - "j2se" => "j" "2" "se"

RE: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I am wondering whether this might be the bug of SOLR-8326, which is fixed in Solr 5.4 That's my guess as a user who ran into the bug myself. -Original Message- From: Kelly, Frank [mailto:frank.ke...@here.com] Sent: Wednesday, March 16, 2016 3:09 PM To: solr-user@lucene.apache.org

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rick, This could be a bug I think. Do you guys use index time boosts? Ahmet On Friday, March 18, 2016 6:15 PM, Rick Sullivan wrote: Yes it seems to be something similar, but the normalization isn't applied to all retrieved documents, which messes with the document

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
You still haven't explained what exactly you are trying to accomplish with that outer level AND/+/MUST. Please be specific - why you insist on "+((fl:java fl:book))" rather than "fl:java fl:book". -- Jack Krupansky On Fri, Mar 18, 2016 at 12:12 AM, Modassar Ather wrote:

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
On Wed, Mar 16, 2016 at 4:10 PM, Shawn Heisey wrote: > On 3/16/2016 8:14 AM, Tom Evans wrote: >> The problem occurs when we attempt to query a node to see if products >> or items is active on that node. The balancer (haproxy) requests the >> ping handler for the appropriate

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
I'm not. I only have query boosts. > Date: Fri, 18 Mar 2016 16:42:36 + > From: iori...@yahoo.com.INVALID > To: solr-user@lucene.apache.org > Subject: Re: Explain score is different from score > > Hi Rick, > > This could be a bug I think. Do you guys

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Thanks for taking look I’m not sure https://issues.apache.org/jira/browse/SOLR-8326 is a match as we aren’t using PKIAuthPlugin -Frank Frank Kelly Principal Software Engineer Predictive Analytics Team (SCBE/HAC/CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32” W

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
Thanks Shawn for your reply. Yes, I'm looking to see if we can implement a combination of tokenizes and filters. However, I tried before that we can only implement one tokenizer for each fieldType. So is it true that I can only stick to one tokenizer, and the rest of the implementation have to

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
HI Michael, i could not post the query. i know its difficult to find out the root cause without query. sorry about that. query includes expand/collpase and query filter (fq) and 2 to 3 terms with AND. please share your thoughts. thanks. Regards, Anil On 17 March 2016 at 19:46, michael solomon

Re: Query behavior.

2016-03-19 Thread Alessandro Benedetti
I think what he tried to explain was : " Input query : *fl:(java OR book)* Instead of having the query parser parsing : *+((fl:java fl:book)~2) *( which seems what is happening right now) He want the query parser to parse : +((fl:java fl:book)) ( without the mm expressed) More than the outer

Re: using solr AnalyticsQuery API vs facet API

2016-03-19 Thread sudsport s
Thanks Joel for responding. but I am still not sure when to use Solr analytics API i vs JSON facet API (What is difference between ValueSource vs PostFilter) I know that ValueSource is useful to implement functions. On Wed, Mar 16, 2016 at 9:49 AM, sudsport s wrote: >

Re: indexing Free-form text description

2016-03-19 Thread Erick Erickson
This question is way too general to answer in any detail, so I'd just start with the text_general fieldType in any of the stock schema.xml files. It would be well for you to get familiar with the admin/analysis page, as you'll have a zillion questions about what each change you make to that

Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
Hi, After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread GW
I think the easiest way to write apps for Solr is with some kind of programming language and the REST API. Don't bother with the PHP or Perl modules. They are deprecated and beyond useless. just use the HTTP call that you see in Solr Admin. Mind the URL encoding when putting together your server

Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by q.op is the default operator. If there is no AND/OR in-between the terms the default will be AND as per my setting of q.op=AND. But what if the query has AND/OR explicitly put in-between the query terms? I just think that if (A OR B) is the query then the result should be

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
Thanks Shawn. we are using 4.10.3. I don't see any issues with replicas of all shards at the time of exception. health of all shards is good in CDH. Regards, Anil On 18 March 2016 at 10:52, Shawn Heisey wrote: > On 3/17/2016 4:22 AM, Anil wrote: > > We are using

How is _rest_managed.json used?

2016-03-19 Thread Alexandre Rafalovitch
Hello, What is _rest_managed.json actually for? I can see the mechanics in the Ref Guide and even found where it is managed by source code. But I cannot figure out how it actually fits into a workflow. It seems to be a registry of REST managed components (e.g. synonyms) for when they are NOT

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run something like id:[1 TO 100] and then id:[100 TO

Re: indexing pdf files using post tool

2016-03-19 Thread Binoy Dalal
Take a look at the CloneFieldUpdateProcessorFactory here: http://www.solr-start.com/info/update-request-processors/ On Wed, 16 Mar 2016, 18:25 Binoy Dalal, wrote: > Like Francisco said, use a custom update processor to map the fields the > way you want and add it to your

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Nick Vasilyev
I work with a similar catalog; except our data is especially bad. We've found that several things helped: - Item level grouping (group same item sold by multiple vendors). Rank items with more vendors a bit higher. - Include a boost function for other attributes, such as an original image of the

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks, would be a great idea but unfortunately we don't have that sort of granularity of features. Can definitely use the category of clicked products though, sounds like a good enough start. On 03/18/2016 04:36 PM, Alessandro Benedetti wrote: Actually if you are able to collect past (

Re: High Cpu sys usage

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:59 AM, Patrick Plaatje wrote: > From the sar output you supplied, it looks like you might have a memory issue > on your hosts. The memory usage just before your crash seems to be *very* > close to 100%. Even the slightest increase (Solr itself, or possibly by a > system service)

Re: stop words as blacklist

2016-03-19 Thread Binoy Dalal
Like Ahmet says, a custom update request processor is the best way to go, and it's pretty simple too. I have a ready to use example here: https://github.com/lttazz99/SolrPluginsExamples On Fri, Mar 18, 2016 at 9:21 PM Ahmet Arslan wrote: > Hi John, > > Do you want to

Would it be better to make my Schema changes within the renamed "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" instead of the way that I am doing it now via curl -X PO

2016-03-19 Thread John Mitchell
I noticed that within "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf" it has a file called "managed-schema" and within this file it says "This is the Solr schema file. This file should be named "schema.xml" and should be in the conf directory". Currently I have not renamed

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 1:14 AM, Alexandre Rafalovitch wrote: > So, I am looking at the Solr 5.5 examples with their all-in by-default > managed schemas. And I am scratching my head on the workflow users are > expected to follow. > > One example is straight from documentation: > "With the above

Solr5 Optimize

2016-03-19 Thread Rallavagu
All, Solr 5.4 with emdbedded Jetty (4G heap) Trying to understand behavior of "optimize" operation if not run explicitly. What is the frequency at which this operation is run, what are the storage requirements and how do we schedule it? Any comments/pointers would greatly help. Thanks in

Re: Solr5 Optimize

2016-03-19 Thread Rallavagu
Thanks Erick. This helps. On 3/16/16 10:11 AM, Erick Erickson wrote: First of all, "optimize-like" does _not_ happen "every time a commit happens". What _does_ happen is the current state of the index is examined and if certain conditions are met _then_ segment merges happen. Think of these as

RE: publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
I build my webpage for searching and create a servlet for it but it is not working I using this Ajax for calling servlet :- $.ajax({ url: contextPath + '/GetResults', data: { qu: $("#query").val() }, dataType:

publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
Hello All, What is the requirement for installing solr on glassfish server, and how can I do it? Regards, Adel Khalifa | Developer | Saudisoft-Egypt | Tel: +2 023 303 2037 - ext 112 | M +2 01149247744 | Fax +2 023 303 2036 | Follow us on

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 9:27 AM, Emir Arnautovic wrote: Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run

Re: how to update billions of docs

2016-03-19 Thread sudsport s
I think there are no inplace updates in solr , that means updates behaves like inserts and marking old version deleted. so behaviors should be same as indexing billions of docs. On Wed, Mar 16, 2016 at 3:52 PM, Mohsin Beg Beg wrote: > Hi, > > I have a requirement to

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 8:56 AM, Emir Arnautovic wrote: Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat

indexing Free-form text description

2016-03-19 Thread Vis Sw
Hi, I am trying to understand the best way to index and search "free text field" e.g. notes or description... Please suggest what will be the best field type, tokenizer, filter... to query Free-form text description of a field. Any example will be great... Regards

  1   2   >