Re: Running Solr on port 80

2016-02-11 Thread Binoy Dalal
The script essentially automates what you would do manually, for the first time when starting up the system. It is no different from extracting the archive, setting permissions etc. yourself. So the next time you wanted to stop/ restart solr, you'll have to do it using the solr script. That being

both way synonyms with ManagedSynonymFilterFactory

2016-02-11 Thread Bjørn Hjelle
Hi, one-way managed synonyms seems to work fine, but I cannot make both-way synonyms work. Steps to reproduce with Solr 5.4.1: 1. create a core: $ bin/solr create_core -c test -d server/solr/configsets/basic_configs 2. edit schema.xml so fieldType text_general looks like this:

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Binoy Dalal
If you're fetching large text fields, consider highlighting on them and just returning the snippets. I faced such a problem some time ago and highlighting sped things up nearly 10x for us. On Thu, 11 Feb 2016, 15:03 Matteo Grolla wrote: > Hi, > I'm trying to

optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi, I'm trying to optimize a solr application. The bottleneck are queries that request 1000 rows to solr. Unfortunately the application can't be modified at the moment, can you suggest me what could be done on the solr side to increase the performance? The bottleneck is just on fetching the

error

2016-02-11 Thread Midas A
we have upgraded solr version last night getting following error org.apache.solr.common.SolrException: Bad content Type for search handler :application/octet-stream what i should do ? to remove this .

Re: error

2016-02-11 Thread Midas A
my log is increasing . it is urgent .. On Fri, Feb 12, 2016 at 10:43 AM, Midas A wrote: > we have upgraded solr version last night getting following error > > org.apache.solr.common.SolrException: Bad content Type for search handler > :application/octet-stream > > what i

Re: error

2016-02-11 Thread Shawn Heisey
On 2/11/2016 10:13 PM, Midas A wrote: > we have upgraded solr version last night getting following error > > org.apache.solr.common.SolrException: Bad content Type for search handler > :application/octet-stream > > what i should do ? to remove this . What version did you upgrade from and what

Re: error

2016-02-11 Thread Midas A
solr 5.2.1 On Fri, Feb 12, 2016 at 12:59 PM, Shawn Heisey wrote: > On 2/11/2016 10:13 PM, Midas A wrote: > > we have upgraded solr version last night getting following error > > > > org.apache.solr.common.SolrException: Bad content Type for search handler > >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Upayavira, I'm working with solr 4.0, sorting on score (default). I tried setting the document cache size to 2048, so all docs of a single request fit (2 requests fit actually) If I execute a query the first time it takes 24s I reexecute it, with all docs in the documentCache and it

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
Thanks. But this yields an error in FacetModule: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:100) at

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
I see a lot of time spent in splitOnTokens which is called by (last part of stack trace) BinaryResponseWriter$Resolver.writeResultsBody() ... solr.search.ReturnsField.wantsField() commons.io.FileNameUtils.wildcardmatch() commons.io.FileNameUtils.splitOnTokens() 2016-02-11 15:42 GMT+01:00

Re: multiple but identical suggestions in autocomplete

2016-02-11 Thread Alessandro Benedetti
Related this, I just created this : https://issues.apache.org/jira/browse/SOLR-8672 To be fair, I see no utility in returning duplicate suggestions ( if they have no different payload, they are un-distinguishable from a human perspective hence useless to have duplication) . I would like to hear

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Yonic, after the first query I find 1000 docs in the document cache. I'm using curl to send the request and requesting javabin format to mimic the application. gc activity is low I managed to load the entire 50GB index in the filesystem cache, after that queries don't cause disk activity

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla wrote: > Hi Yonic, > after the first query I find 1000 docs in the document cache. > I'm using curl to send the request and requesting javabin format to mimic > the application. > gc activity is low > I managed to load

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Is this a scenario that was working fine and suddenly deteriorated, or has it always been slow? -- Jack Krupansky On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Responses have always been slow but previously time was dominated by faceting. After few optimization this is my bottleneck. My suggestion has been to properly implement paging and reduce rows, unfortunately this is not possible at least not soon 2016-02-11 16:18 GMT+01:00 Jack Krupansky

Re: Solr architecture

2016-02-11 Thread Upayavira
Your biggest issue here is likely to be http connections. Making an HTTP connection to Solr is way more expensive than the ask of adding a single document to the index. If you are expecting to add 24 billion docs per day, I'd suggest that somehow merging those documents into batches before sending

Re: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 10:04 AM, Markus Jelsma wrote: > Thanks. But this yields an error in FacetModule: > > java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map > at >

Select distinct records

2016-02-11 Thread Brian Narsi
I am trying to select distinct records from a collection. (I need distinct name and corresponding id) I have tried using grouping and group format of simple but that takes a long time to execute and sometimes runs into out of memory exception. Another limitation seems to be that total number of

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
[image: Immagine incorporata 1] 2016-02-11 16:05 GMT+01:00 Matteo Grolla : > I see a lot of time spent in splitOnTokens > > which is called by (last part of stack trace) > > BinaryResponseWriter$Resolver.writeResultsBody() > ... > solr.search.ReturnsField.wantsField() >

RE: Running Solr on port 80

2016-02-11 Thread Davis, Daniel (NIH/NLM) [C]
You should edit the files installed by install_solr_service.sh - change the init.d script to pass the -p argument to ${SOLRINSTALLDIR}/bin/solr. By the way, my initscript is modified (a) to support the conventional /etc/sysconfig/ convention, and (b) to run solr as a different user than the

Re: Select distinct records

2016-02-11 Thread Binoy Dalal
What version of Solr are you using? Have you taken a look at the Collapsing Query Parser. It basically performs the same functions as grouping but is much more efficient at doing it. Take a look here: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results On Thu, Feb 11,

Re: Need to move on SOlr cloud (help required)

2016-02-11 Thread Midas A
Erick , bq: We want the hits on solr servers to be distributed True, this happens automatically in SolrCloud, but a simple load balancer in front of master/slave does the same thing. Midas : in case of solrcloud architecture we need not to have load balancer ? . On Thu, Feb 11, 2016 at 11:42

Re: Select distinct records

2016-02-11 Thread Joel Bernstein
Yeah that would be the reason. If you want distributed unique capabilities, then you might want to start testing out 6.0. Aside from SELECT DISTINCT queries, you also have a much more mature Streaming Expression library which supports the unique operation. Joel Bernstein

Re: How is Tika used with Solr

2016-02-11 Thread xavi jmlucjav
I have found that when you deal with large amounts of all sort of files, in the end you find stuff (pdfs are typically nasty) that will hang tika. That is even worse that a crash or OOM. We used aperture instead of tika because at the time it provided a watchdog feature to kill what seemed like a

RE: outlook email file pst extraction problem

2016-02-11 Thread Allison, Timothy B.
Should have looked at how we handle psts before earlier responsesorry. What you're seeing is Tika's default treatment of embedded documents, it concatenates them all into one string. It'll do the same thing for zip files and other container files. The default Tika format is xhtml, and we

RE: outlook email file pst extraction problem

2016-02-11 Thread Allison, Timothy B.
Y, this looks like a Tika feature. If you run the tika-app.jar [1]on your file and you get the same output, then that's Tika's doing. Drop a note on the u...@tika.apache.org list if Tika isn't meeting your needs. -Original Message- From: Sreenivasa Kallu

RE: How is Tika used with Solr

2016-02-11 Thread Allison, Timothy B.
x-post to Tika user's Y and n. If you run tika app as: java -jar tika-app.jar It runs tika-batch under the hood (TIKA-1330 as part of TIKA-1302). This creates a parent and child process, if the child process notices a hung thread, it dies, and the parent restarts it. Or if your OS gets

Re: Custom auth plugin not loaded in SolrCloud

2016-02-11 Thread Noble Paul
yes, runtime lib cannot be used for loading container level plugins yet. Eventually they must. You can open a ticket On Mon, Jan 4, 2016 at 1:07 AM, tine-2 wrote: > Hi, > > are there any news on this? Was anyone able to get it to work? > > Cheers, > > tine > > > >

Are fieldCache and/or DocValues used by Function Queries

2016-02-11 Thread Andrea Roggerone
Hi, I need to evaluate different boost solutions performance and I can't find any relevant documentation about it. Are fieldCache and/or DocValues used by Function Queries?

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Thanks Toke, yes, they are long times, and solr qtime (to execute the query) is a fraction of a second. The response in javabin format is around 300k. Currently I can't limit the rows requested or the fields requested, those are fixed for me. 2016-02-11 13:14 GMT+01:00 Toke Eskildsen

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Upayavira
On Thu, Feb 11, 2016, at 09:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request 1000 rows to solr. > Unfortunately the application can't be modified at the moment, can you > suggest me what could be done on the solr side

Re: Size of logs are high

2016-02-11 Thread Aditya Sundaram
Can you check your log level? Probably log level of error would suffice for your purpose and it would most certainly reduce your log size(s). On Thu, Feb 11, 2016 at 12:53 PM, kshitij tyagi wrote: > Hi, > I have migrated to solr 5.2 and the size of logs are high. >

Re: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Yonik Seeley
On Wed, Feb 10, 2016 at 5:21 AM, Markus Jelsma wrote: > Hi - if we assume the following simple documents: > > > 2015-01-01T00:00:00Z > 2 > > > 2015-01-01T00:00:00Z > 4 > > > 2015-01-02T00:00:00Z > 3 > > > 2015-01-02T00:00:00Z > 7 > > > Can i get

Re: Solr architecture

2016-02-11 Thread Emir Arnautovic
Hi Mark, Nothing comes for free :) With doc per action, you will have to handle large number of docs. There is hard limit for number of docs per shard - it is ~4 billion (size of int) so sharding is mandatory. It is most likely that you will have to have more than one collection. Depending on

Re: [More Like This] Query building

2016-02-11 Thread Alessandro Benedetti
Hi Guys, is it possible to have any feedback ? Is there any process to speed up bug resolution / discussions ? just want to understand if the patch is not good enough, if I need to improve it or simply no-one took a look ... https://issues.apache.org/jira/browse/LUCENE-6954 Cheers On 11 January

Re: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Tom Evans
On Wed, Feb 10, 2016 at 12:13 PM, Markus Jelsma wrote: > Hi Tom - thanks. But judging from the article and SOLR-6348 faceting stats > over ranges is not yet supported. More specifically, SOLR-6352 is what we > would need. > > [1]:

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Hi Matteo, as an addition to Upayavira observation, how is the memory assigned for that Solr Instance ? How much memory is assigned to Solr and how much left for the OS ? Is this a VM on top of a physical machine ? So it is the real physical memory used, or swapping could happen frequently ? Is

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla wrote: > Thanks Toke, yes, they are long times, and solr qtime (to execute the > query) is a fraction of a second. > The response in javabin format is around 300k. OK, That tells us a lot. And if you actually tested so that

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Are queries scaling linearly - does a query for 100 rows take 1/10th the time (1 sec vs. 10 sec or 3 sec vs. 30 sec)? Does the app need/expect exactly 1,000 documents for the query or is that just what this particular query happened to return? What does they query look like? Is it complex or use

Re: Select distinct records

2016-02-11 Thread Brian Narsi
I am using Solr 5.1.0 On Thu, Feb 11, 2016 at 9:19 AM, Binoy Dalal wrote: > What version of Solr are you using? > Have you taken a look at the Collapsing Query Parser. It basically performs > the same functions as grouping but is much more efficient at doing it. > Take

Re: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 11:07 AM, Markus Jelsma wrote: > Hi - i was sending the following value for json.facet: > json.facet=by_day:{type : range, start : NOW-30DAY/DAY, end : NOW/DAY, gap : > "+1DAY", facet:{x : "avg(rank)"}} > > I now also notice i didn't include

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but still relatively bad. Even 50ms for 10 rows would be considered barely okay. But... again it depends on query complexity - simple queries should be well under 50 ms for decent modern hardware. -- Jack Krupansky On Thu, Feb 11,

Re: Logging request times

2016-02-11 Thread Shawn Heisey
On 2/10/2016 10:33 AM, McCallick, Paul wrote: > We’re trying to fine tune our query and ingestion performance and would like > to get more metrics out of SOLR around this. We are capturing the standard > logs as well as the jetty request logs. The standard logs get us QTime, > which is not a

Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2

2016-02-11 Thread Steven White
For my application, the solution I implemented is I log the chunk that failed into a file. This file is than post processed one record at a time. The ones that fail, are reported to the admin and never looked at again until the admin takes action. This is not the most efficient solution right

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Jack, response time scale with rows. Relationship doens't seem linear but Below 400 rows times are much faster, I view query times from solr logs and they are fast the same query with rows = 1000 takes 8s with rows = 10 takes 0.2s 2016-02-11 16:22 GMT+01:00 Jack Krupansky

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
Hi - i was sending the following value for json.facet: json.facet=by_day:{type : range, start : NOW-30DAY/DAY, end : NOW/DAY, gap : "+1DAY", facet:{x : "avg(rank)"}} I now also notice i didn't include the time field. But adding it gives the same error: json.facet=by_day:{type : range, field :

Re: Select distinct records

2016-02-11 Thread Joel Bernstein
Solr 6.0 supports SELECT DISTINCT (SQL) queries. You can even choose between a MapReduce implementation and a Json Facet implementation. The MapReduce Implementation supports extremely high cardinality for the distinct fields. Json Facet implementation supports lower cardinality but high QPS.

Re: How is Tika used with Solr

2016-02-11 Thread Erick Erickson
Well, I'd imagine you could spawn threads and monitor/kill them as necessary, although that doesn't deal with OOM errors FWIW, Erick On Thu, Feb 11, 2016 at 3:08 PM, xavi jmlucjav wrote: > For sure, if I need heavy duty text extraction again, Tika would be the > obvious

Re: SolrCloud shard marked as down and "reloading" collection doesnt restore it

2016-02-11 Thread KNitin
After more debugging, I figured out that it is related to this: https://issues.apache.org/jira/browse/SOLR-3274 Is there a recommended fix (apart from running a zk ensemble?) On Thu, Feb 11, 2016 at 10:29 AM, KNitin wrote: > Hi, > > I noticed while running an indexing

Re: edismax query parser - pf field question

2016-02-11 Thread Erick Erickson
Try comma instead of space delimiting? On Thu, Feb 11, 2016 at 2:33 PM, Senthil wrote: > Clarification needed on edismax query parser "pf" field. > > *SOLR Query:* > /query?q=refrigerator water filter=P_NAME^1.5 > CategoryName=xml=on=P_NAME > CategoryName=2=CategoryName P_NAME

RE: How is Tika used with Solr

2016-02-11 Thread Allison, Timothy B.
Y, and you can't actually kill a thread. You can ask nicely via Thread.interrupt(), but some of our dependencies don't bother to listen for that. So, you're pretty much left with a separate process as the only robust solution. So, we did the parent-child process thing for directory->

Re: How is Tika used with Solr

2016-02-11 Thread xavi jmlucjav
For sure, if I need heavy duty text extraction again, Tika would be the obvious choice if it covers dealing with hangs. I never used tika-server myself (not sure if it existed at the time) just used tika from my own jvm. On Thu, Feb 11, 2016 at 8:45 PM, Allison, Timothy B.

Re: slave is getting full synced every polling

2016-02-11 Thread Novin Novin
Hi Erick, Below is master slave config: Master: commit optimize 2 Slave: http://master:8983/solr/big_core/replication 00:00:60 username password Do you mean the Solr is restarting every minute or the polling

Re: How is Tika used with Solr

2016-02-11 Thread Steven White
Tim, In my case, I have to use Tika as follows: java -jar tika-app.jar -t I will be invoking the above command from my Java app using Runtime.getRuntime().exec(). I will capture stdout and stderr to get back the raw text i need. My app use case will not allow me to use a , it is out of

Re: Select distinct records

2016-02-11 Thread Brian Narsi
In order to use the Collapsing feature I will need to use Document Routing to co-locate related documents in the same shard in SolrCloud. What are the advantages and disadvantages of Document Routing? Thanks, On Thu, Feb 11, 2016 at 12:54 PM, Joel Bernstein wrote: > Yeah

Re: slave is getting full synced every polling

2016-02-11 Thread Erick Erickson
Typo? That's 60 seconds, but that's not especially interesting either way. Do the actual segment's look identical after the polling? On Thu, Feb 11, 2016 at 1:16 PM, Novin Novin wrote: > Hi Erick, > > Below is master slave config: > > Master: > > > commit >

edismax query parser - pf field question

2016-02-11 Thread Senthil
Clarification needed on edismax query parser "pf" field. *SOLR Query:* /query?q=refrigerator water filter=P_NAME^1.5 CategoryName=xml=on=P_NAME CategoryName=2=CategoryName P_NAME score=edismax *Parsed Query from DebugQuery results:* (+((DisjunctionMaxQuery((P_NAME:refriger^1.5 |

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Again, first things first... debugQuery=true and see which Solr search components are consuming the bulk of qtime. -- Jack Krupansky On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla wrote: > virtual hardware, 200ms is taken on the client until response is written to >

slave is getting full synced every polling

2016-02-11 Thread Novin Novin
Hi Guys, I'm having a problem with master slave syncing. So I have two cores one is small core (just keep data use frequently for fast results) and another is big core (for rare query and for search in every thing). both core has same solrconfig file. But small core replication is fine, other

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Out of curiosity, have you tried to debug that solr version to see which text arrives to the splitOnTokens method ? In latest solr that part has changed completely. Would be curious to understand what it tries to tokenise by ? and * ! Cheers On 11 February 2016 at 16:33, Matteo Grolla

Re: Select distinct records

2016-02-11 Thread Brian Narsi
I have tried to use the Collapsing feature but it appears that it leaves duplicated records in the result set. Is that expected? Or any suggestions on working around it? Thanks On Thu, Feb 11, 2016 at 9:30 AM, Brian Narsi wrote: > I am using > > Solr 5.1.0 > > On Thu, Feb

Re: Select distinct records

2016-02-11 Thread Joel Bernstein
The CollapsingQParserPlugin shouldn't have duplicates in the result set. Can you provide the details? Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Feb 11, 2016 at 12:02 PM, Brian Narsi wrote: > I have tried to use the Collapsing feature but it appears that it leaves

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
virtual hardware, 200ms is taken on the client until response is written to disk qtime on solr is ~90ms not great but acceptable Is it possible that the method FilenameUtils.splitOnTokens is really so heavy when requesting a lot of rows on slow hardware? 2016-02-11 17:17 GMT+01:00 Jack Krupansky

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
Awesome! The surrounding braces did the thing. Fixed the quotes just before. Many thanks!! The remaining issue is that some source files in o.a.s.search.facet package are package protected or private. I can't implement a custom Agg using FacetContext and such. Created issue:

Custom plugin to handle proprietary binary input stream

2016-02-11 Thread michael dürr
I'm looking for an option to write a Solr plugin which can deal with a custom binary input stream. Unfortunately Solr's javabin as a protocol is not an option for us. I already had a look at some possibilities like writing a custom request handler, but it seems like the classes/interfaces one

Enforce client auth in Solr

2016-02-11 Thread GAUTHAM S
Hello, I am trying to implement a Solr cluster with mutual authentication using client and server SSL certificates. I have both client and server certificates signed by CA. The set up is working good, however any client cert that chains up to issuer CA are able to access the Solr cluster without

dismax for bigrams and phrases

2016-02-11 Thread Le Zhao
Hey Solr folks, Current dismax parser behavior is different for unigrams versus bigrams. For unigrams, it's MAX-ed across fields (so called dismax), but for bigrams, it's SUM-ed from Solr 4.10 (according to https://issues.apache.org/jira/browse/SOLR-6062). Given this inconsistency, the

dismax for bigrams and phrases

2016-02-11 Thread Le Zhao
Hey Solr folks, Current dismax parser behavior is different for unigrams versus bigrams. For unigrams, it's MAX-ed across fields (so called dismax), but for bigrams, it's SUM-ed from Solr 4.10 (according to https://issues.apache.org/jira/browse/SOLR-6062). Given this inconsistency, the

Re: slave is getting full synced every polling

2016-02-11 Thread Erick Erickson
What is your replication configuration in solrconfig.xml on both master and slave? bq: big core is doing full sync every time wherever it start (every minute). Do you mean the Solr is restarting every minute or the polling interval is 60 seconds? The Solr logs should tell you something about

Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2

2016-02-11 Thread Erick Erickson
Steven's solution is a very common one, complete to the notion of re-chunking. Depending on the throughput requirements, simply resending the offending packet one at a time is often sufficient (but not _efficient). I can imagine fallback scenarios like "try chunking 100 at a time, for those chunks

Re: Size of logs are high

2016-02-11 Thread Erick Erickson
You can also look at hour log4j properties file and manipulate the max log size, how many old versions are retained etc. If you're talking about the console log, people often just disable console logging (again in the logging properties file). Best, Erick On Thu, Feb 11, 2016 at 6:11 AM, Aditya

Re: Need to move on SOlr cloud (help required)

2016-02-11 Thread Erick Erickson
bq: We want the hits on solr servers to be distributed True, this happens automatically in SolrCloud, but a simple load balancer in front of master/slave does the same thing. bq: what if master node fail what should be our fail over strategy ? This is, indeed one of the advantages for

Re: Tune Data Import Handler to retrieve maximum records

2016-02-11 Thread Erick Erickson
It's possible with JDBC settings (see the specific ones for your drive), but dangerous. What if the number of rows is 1B or something? You'll blow Solr's memory out of the water Best, Erick On Wed, Feb 10, 2016 at 12:45 PM, Troy Edwards wrote: > Is it possible for

Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2

2016-02-11 Thread Walter Underwood
I first wrote the “fall back to one at a time” code for Solr 1.3. It is pretty easy if you plan for it. Make the batch size variable. When a batch fails, retry with a batch size of 1 for that particular batch. Then keep going or fail, either way, you have good logging on which one failed.

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Toke Eskildsen
On Thu, 2016-02-11 at 11:53 +0100, Matteo Grolla wrote: > I'm working with solr 4.0, sorting on score (default). > I tried setting the document cache size to 2048, so all docs of a single > request fit (2 requests fit actually) > If I execute a query the first time it takes 24s > I

Re: Select distinct records

2016-02-11 Thread Brian Narsi
Ok I see that Collapsing features requires documents to be co-located in the same shard in SolrCloud. Could that be a reason for duplication? On Thu, Feb 11, 2016 at 11:09 AM, Joel Bernstein wrote: > The CollapsingQParserPlugin shouldn't have duplicates in the result set. >

ApacheCon NA 2016 - Important Dates!!!

2016-02-11 Thread Melissa Warnkin
Hello everyone! I hope this email finds you well.  I hope everyone is as excited about ApacheCon as I am! I'd like to remind you all of a couple of important dates, as well as ask for your assistance in spreading the word! Please use your social media platform(s) to get the word out! The more

SolrCloud shard marked as down and "reloading" collection doesnt restore it

2016-02-11 Thread KNitin
Hi, I noticed while running an indexing job (2M docs but per doc size could be 2-3 MB) that one of the shards goes down just after the commit. (Not related to OOM or high cpu/load). This marks the shard as "down" in zk and even a reload of the collection does not recover the state. There are

outlook email file pst extraction problem

2016-02-11 Thread Sreenivasa Kallu
Hi , I am currently indexing individual outlook messages and searching is working fine. I have created solr core using following command. ./solr create -c sreenimsg1 -d data_driven_schema_configs I am using following command to index individual messages. curl "