Re: replicate indexing to second site

2016-02-10 Thread tedsolr
Cross data center replication sounds like a great feature. I read Yonik's post on it. I'll keep my ear to the ground. In the meantime it's good to know there's nothing built in to handle this, so it will involve some design effort. I have my head wrapped around sending index requests in parallel,

Does bf for eDismax use DocValue or FieldCache?

2016-02-10 Thread Andrea Roggerone
Hi, I need to boost documents at runtime according to a set of roles and related ids. For instance I would have the fields: ceo:1234-abcd-5678-poiu tl:-abcd-5678-abc and a set of boosts to apply a runtime, for instance ceo = 10 tl = 5 I don't want to do any complex operation with the weights

Re: replicate indexing to second site

2016-02-10 Thread Shawn Heisey
On 2/10/2016 8:02 AM, tedsolr wrote: > I have my head wrapped around sending index requests in parallel, but in a > later post you mentioned how you separately track the most recent update and > are able to sync from that point if needed. That I don't get. Is it an index > version you are

Solrj-collection creation

2016-02-10 Thread vidya
Hi I want to connect to solrCloud server from java program using zookeeperHost variable. I know that data can be indexed and searched from a collection using java program. but Can i able to create a collection initially from java program? My problem is that i cannot access solr web page, i'm

Re: Need to move on SOlr cloud (help required)

2016-02-10 Thread Jack Krupansky
What exactly is your motivation? I mean, the primary benefit of SolrCloud is better support for sharding, and you have only a single shard. If you have no need for sharding and your master-slave replicated Solr has been working fine, then stick with it. If only one machine is having a load

Re: replicate indexing to second site

2016-02-10 Thread tedsolr
Arcadius, Thanks for sharing your multi data center design. My requirements are different (hot site - warm site) but nevertheless your posts are very interesting. It helps to know that in many cases someone else has already cut their teeth on the problem you're trying to solve. Ted -- View

ExactStatsCache not very exact

2016-02-10 Thread Markus Jelsma
Hi - i've noticed ExactStatsCache is not very exact on consecutive calls, see the following explains for the number one result: 70.76961 = sum of: 70.76961 = max plus 0.65 times others of: 70.76961 = weight(title_nl:contactformulier in 210879) [], result of: 70.76961 =

RE: Solrj-collection creation

2016-02-10 Thread Davis, Daniel (NIH/NLM) [C]
Generally, creating a collection may also include uploading a zookeeper configuration: import org.apache.solr.common.cloud.SolrZkClient; import org.apache.solr.common.cloud.ZkConfigManager; import org.apache.solr.common.cloud.ZkStateReader; /* ... much later ... */ SolrZkClient zkClient =

Re: Solrj-collection creation

2016-02-10 Thread Shawn Heisey
On 2/10/2016 6:55 AM, vidya wrote: > I want to connect to solrCloud server from java program using > zookeeperHost variable. I know that data can be indexed and searched from a > collection using java program. but Can i able to create a collection > initially from java program? Yes. Use the

Re: Solr 4.7 replication not working

2016-02-10 Thread Erick Erickson
Is this some kind of typo in your slave configuration? 'cause it's kinda weird. The error mentioning collection1 indicates I think that the masterUrl is not parseable (and somehow doesn't throw a parsing error on startup) and the old default was "collection1". This URL should point to a single

Re: How is Tika used with Solr

2016-02-10 Thread Erick Erickson
Timothy's points are absolutely spot-on. In production scenarios, if you use the simple "run Tika in a SolrJ program" approach you _must_ abort the program on OOM errors and the like and figure out what's going on with the offending document(s). Or record the name somewhere and skip it next time

Solr 4.7 replication not working

2016-02-10 Thread Richardson, Jacquelyn F.
All, I have solr 4.7 installed in a Windows 7 environment. My solrconfig.xml on the master is: ${master.replication.enabled:true} commit startup optimize optimize commit

Re: Solrj-collection creation

2016-02-10 Thread Erick Erickson
Since you're using SolrJ anyway just use the CollectionsAdminRequest. You can see examples of it's use in the test cases, take a look at CollectionsApiSolrJTests.. Best, Erick On Wed, Feb 10, 2016 at 5:55 AM, vidya wrote: > Hi > > I want to connect to solrCloud server

Logging request times

2016-02-10 Thread McCallick, Paul
We’re trying to fine tune our query and ingestion performance and would like to get more metrics out of SOLR around this. We are capturing the standard logs as well as the jetty request logs. The standard logs get us QTime, which is not a good indication of how long the actual request took to

RE: ExactStatsCache not very exact

2016-02-10 Thread Markus Jelsma
Well, what do we have here. I just saw a different docCount in the same result set for the same field. These two are the explains for the top two documents in the same result set: 1: 70.77082 = sum of: 70.77082 = max plus 0.65 times others of: 70.77082 = weight(title_nl:contactformulier

Re: Solr architecture

2016-02-10 Thread Mark Robinson
Thanks everyone for your suggestions. Based on it I am planning to have a doc per event. On Wed, Feb 10, 2016 at 3:38 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Mark, > Appending session actions just to be able to return more than one session > without retrieving large

Re: Tesseract command-line OCR engine has stopped working

2016-02-10 Thread Jan Høydahl
You do not tell us much of how Solr is setup. I found your stackoverflow question too at http://stackoverflow.com/questions/35220443/tesseract-command-line-ocr-engine-has-stopped-working with a screenshot. That suggests that you have setup Tika with OCR for images, and emails with images are

Tune Data Import Handler to retrieve maximum records

2016-02-10 Thread Troy Edwards
Is it possible for the Data Import Handler to bring in maximum number of records depending on available resources? If so, how should it be configured? Thanks,

Re: Solr architecture

2016-02-10 Thread Mark Robinson
Thanks everyone for your suggestions. Based on it I am planning to have one doc per event with sessionId common. So in this case hopefully indexing each doc as and when it comes would be okay? Or do we still need to batch and index to Solr? Also with 4M sessions a day with about 6000 docs

Re: Need to move on SOlr cloud (help required)

2016-02-10 Thread Midas A
hi, what if master node fail what should be our fail over strategy ? On Wed, Feb 10, 2016 at 9:12 PM, Jack Krupansky wrote: > What exactly is your motivation? I mean, the primary benefit of SolrCloud > is better support for sharding, and you have only a single shard.

Size of logs are high

2016-02-10 Thread kshitij tyagi
Hi, I have migrated to solr 5.2 and the size of logs are high. Can anyone help me out here how to control this?

Re: Running Solr on port 80

2016-02-10 Thread Jeyaprakash Singarayar
That ok if I'm using it in local, but I'm doing it in a production based on the below page https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production On Thu, Feb 11, 2016 at 12:58 PM, Binoy Dalal wrote: > Why don't you directly run solr from the script

Overseer Queues of zookeeper

2016-02-10 Thread Zap Org
how to delete overseer queues of zookeeper

Re: Need to move on SOlr cloud (help required)

2016-02-10 Thread kshitij tyagi
@Jack Currently we have around 55,00,000 docs Its not about load on one node we have load on different nodes at different times as our traffic is huge around 60k users at a given point of time We want the hits on solr servers to be distributed so we are planning to move on solr cloud as it

Running Solr on port 80

2016-02-10 Thread Jeyaprakash Singarayar
Hi, I'm trying to install solr 5.4.1 on CentOS. I know that while installing Solr as a service in the Linux we can pass -p to shift the app to host on that port. ./install_solr_service.sh solr-5.4.1.tgz -p 8984 -f but still it shows as it is hosted on 8983 and not on 8984. Any idea? Waiting

Re: Running Solr on port 80

2016-02-10 Thread Binoy Dalal
Why don't you directly run solr from the script provided in {SOLR_DIST}\bin ./solr start -p 8984 On Thu, 11 Feb 2016, 12:56 Jeyaprakash Singarayar wrote: > Hi, > > I'm trying to install solr 5.4.1 on CentOS. I know that while installing > Solr as a service in the Linux

Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2

2016-02-10 Thread Debraj Manna
Thanks Erik. How do people handle this scenario? Right now the only option I can think of is to replay the entire batch by doing add for every single doc. Then this will give me error for all the docs which got added from the batch. On Tue, Feb 9, 2016 at 10:57 PM, Erick Erickson

Json faceting, aggregate numeric field by day?

2016-02-10 Thread Markus Jelsma
Hi - if we assume the following simple documents: 2015-01-01T00:00:00Z 2 2015-01-01T00:00:00Z 4 2015-01-02T00:00:00Z 3 2015-01-02T00:00:00Z 7 Can i get a daily average for the field 'value' by day? e.g. 3.0 5.0 Reading the documentation, i don't think i can, or i

solr mlt with grouping issue

2016-02-10 Thread sara hajili
hi all.i have a mlt query. and i wanna to categorise query result based on special field. so i wanna to use group solr feature. but solr mlt does not support group . i used group = true group.field =filed1, group.limit=3 but i did n't get group result. ** i test solr group feature with select

Re: Json faceting, aggregate numeric field by day?

2016-02-10 Thread Tom Evans
On Wed, Feb 10, 2016 at 10:21 AM, Markus Jelsma wrote: > Hi - if we assume the following simple documents: > > > 2015-01-01T00:00:00Z > 2 > > > 2015-01-01T00:00:00Z > 4 > > > 2015-01-02T00:00:00Z > 3 > > > 2015-01-02T00:00:00Z > 7 > > > Can i

Re: Solr architecture

2016-02-10 Thread Emir Arnautovic
Hi Mark, Appending session actions just to be able to return more than one session without retrieving large number of results is not good tradeoff. Like Upayavira suggested, you should consider storing one action per doc and aggregate on read time or push to Solr once session ends and

Need to move on SOlr cloud (help required)

2016-02-10 Thread kshitij tyagi
Hi, We are currently using solr 5.2 and I need to move on solr cloud architecture. As of now we are using 5 machines : 1. I am using 1 master where we are indexing ourdata. 2. I replicate my data on other machines One or the other machine keeps on showing high load so I am planning to move on

Re: How is Tika used with Solr

2016-02-10 Thread Charlie Hull
On 09/02/2016 22:49, Alexandre Rafalovitch wrote: Solr uses Tika directly. And not in the most efficient way. It is there mostly for convenience rather than performance. So, for performance, Solr recommendation is also to run Tika separately and only send Solr the processed documents.

Re: Need to move on SOlr cloud (help required)

2016-02-10 Thread Binoy Dalal
What is the size of your index, hardware specs, average query load, rate of Indexing? On Wed, 10 Feb 2016, 14:14 kshitij tyagi wrote: > Hi, > > We are currently using solr 5.2 and I need to move on solr cloud > architecture. > > As of now we are using 5 machines : >

RE: Json faceting, aggregate numeric field by day?

2016-02-10 Thread Markus Jelsma
Hi Tom - thanks. But judging from the article and SOLR-6348 faceting stats over ranges is not yet supported. More specifically, SOLR-6352 is what we would need. [1]: https://issues.apache.org/jira/browse/SOLR-6348 [2]: https://issues.apache.org/jira/browse/SOLR-6352 Thanks anyway, at least we

RE: How is Tika used with Solr

2016-02-10 Thread Allison, Timothy B.
I completely agree on the impulse, and for the vast majority of the time (regular catchable exceptions), that'll work. And, by vast majority, aside from oom on very large files, we aren't seeing these problems any more in our 3 million doc corpus (y, I know, small by today's standards) from

RE: How is Tika used with Solr

2016-02-10 Thread Allison, Timothy B.
Ha. Spoke too soon about this thread not getting swamped. Will add the dropwizard-tika-server to our wiki page. Thank you for the link! As a side note, I'll submit a pull request to update the AbstractTikaResource to avoid a potential NPE if the mime type can't be parsed...we just fixed this