Re: Solr And query

2014-10-31 Thread vsriram30
Yes Erick. Correctly pointed. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166789.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Design optimal Solr Schema

2014-10-31 Thread tomas.kalas
Thanks for your help. Ok i try it explain one more, sorry for my english. I need to some functions in my searching. 1.) I will have a lot of documents naturally and i want find out if is for example is phrase for example to 5 words apart. I used w:Good morning~5. (in example solr it works, but i

Re: Design optimal Solr Schema

2014-10-31 Thread tomas.kalas
Oh yes, i want to display stored data in html file. I have 2 pages, at one page is form and i show here results. Result here is link (by ID) at file where is all conversation in second page. And how did you mean sepparate each conversation interaction ? Thanks. -- View this message in context:

Re: issue related to blank value in datefield

2014-10-31 Thread Aman Tandon
Thanks Chris With Regards Aman Tandon On Fri, Oct 31, 2014 at 5:45 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I was just trying to index the fields returned by my msql and i found this If you are importing dates from MySql where you have -00-00T00:00:00Z as the default

Solr index corrupt question

2014-10-31 Thread ku3ia
Hi folks! I'm interesting in, can delete operation destroy Solr index, if optimize command never perform? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-index-corrupt-question-tp4166810.html Sent from the Solr - User mailing list archive at Nabble.com.

The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor

2014-10-31 Thread 5ton3
Hi! Not sure if this is a problem or if I just don't understand the debug response, but it seems somewhat odd to me. The main entity can have multiple BLOB documents. I'm using Tika Entity Processor to retrieve the body (plaintext) from these documents and put the result in a multivalued field,

Re: Ideas for debugging poor SolrCloud scalability

2014-10-31 Thread Ian Rose
Hi Erick - Thanks for the detailed response and apologies for my confusing terminology. I should have said WPS (writes per second) instead of QPS but I didn't want to introduce a weird new acronym since QPS is well known. Clearly a bad decision on my part. To clarify: I am doing *only* writes

RE: Missing Records

2014-10-31 Thread AJ Lemke
I started this collection using this command: http://localhost:8983/solr/admin/collections?action=CREATEname=inventorynumShards=1replicationFactor=2maxShardsPerNode=4 So 1 shard and replicationFactor of 2 AJ -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent:

Re: Solr index corrupt question

2014-10-31 Thread Erick Erickson
Not quite sure what you mean by destroy. I can use a delete-by-query with *:* and mark all docs in my index deleted. Search results will return nothing but it's still a valid index, it just consists of all deleted docs. All the segments may be removed even in the absence of an optimize due to

Re: The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor

2014-10-31 Thread Erick Erickson
Your message looks like it's missing stuff (snapshots?), the e-mail for this list generally strips attachments, so you'll have to put them somewhere else and link to them if you want us to see them. Best, Erick On Fri, Oct 31, 2014 at 5:11 AM, 5ton3 oysha...@gmail.com wrote: Hi! Not sure if

RE: Missing Records

2014-10-31 Thread AJ Lemke
Hi Erick: All of the records are coming out of an auto numbered field so the ID's will all be unique. Here is the the test I ran this morning: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 28m) Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0,

Re: Ideas for debugging poor SolrCloud scalability

2014-10-31 Thread Erick Erickson
NP, just making sure. I suspect you'll get lots more bang for the buck, and results much more closely matching your expectations if 1 you batch up a bunch of docs at once rather than sending them one at a time. That's probably the easiest thing to try. Sending docs one at a time is something of

exporting to CSV with solrj

2014-10-31 Thread tedsolr
I am trying to invoke the CSVResponseWriter to create a CSV file of all stored fields. There are millions of documents so I need to write to the file iteratively. I saw a snippet of code online that claimed it could effectively remove the SorDocumentList wrapper and allow the docs to be retrieved

Re: Missing Records

2014-10-31 Thread Erick Erickson
OK, that is puzzling. bq: If there were duplicates only one of the duplicates should be removed and I still should be able to search for the ID and find one correct? Correct. Your bad request error is puzzling, you may be on to something there. What it looks like is that somehow some of the

Re: Solr index corrupt question

2014-10-31 Thread ku3ia
Hi, Erick. Thanks for you response. I'd checked my index via check index utility, and what I'm got: 3 of 41: name=_1ouwn docCount=518333 codec=Lucene46 compound=false numFiles=11 size (MB)=431.564 diagnostics = {timestamp=1412166850391, os=Linux, os.version=3.2.0-68-generic,

Re: exporting to CSV with solrj

2014-10-31 Thread Jorge Luis Betancourt Gonzalez
When you fire a query against Solr with the wt=csv the response coming from Solr is *already* in CSV, the CSVResponseWriter is responsible for translating SolrDocument instances into a CSV on the server side, son I don’t see any reason on using it by your self, Solr already do the heavy lifting

Only copy string up to certain character symbol?

2014-10-31 Thread hschillig
So I have a title field that is common to look like this: Personal legal forms simplified : the ultimate guide to personal legal forms / Daniel Sitarz. I made a copyField that is of type title_only. I want to ONLY copy the text Personal legal forms simplified : the ultimate guide to personal

Re: Only copy string up to certain character symbol?

2014-10-31 Thread Alexandre Rafalovitch
copyField can copy only part of the string but it is defined by character count. If you want to use regular expressions, you may be better off to do the copy in the UpdateRequestProcessor chain instead: http://www.solr-start.com/info/update-request-processors/#RegexReplaceProcessorFactory What

Re: exporting to CSV with solrj

2014-10-31 Thread tedsolr
Sure thing, but how do I get the results output in CSV format? response.getResults() is a list of SolrDocuments. -- View this message in context: http://lucene.472066.n3.nabble.com/exporting-to-CSV-with-solrj-tp4166845p4166861.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Missing Records

2014-10-31 Thread AJ Lemke
I have run some more tests so the numbers have changed a bit. Index Results done on Node 1: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 31m 47s) Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: 903,993 Node 1: Last Modified: 44

Re: exporting to CSV with solrj

2014-10-31 Thread Alexandre Rafalovitch
Why do you want to use CSV in SolrJ? You would just have to parse it again. You could just trigger that as a URL call from outside with cURL or as just an HTTP (not SolrJ) call from Java client. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and

Re: exporting to CSV with solrj

2014-10-31 Thread Chris Hostetter
: Sure thing, but how do I get the results output in CSV format? : response.getResults() is a list of SolrDocuments. Either use something like the NoOpResponseParser which will give you the entire response back as a single string, or implement your own ResponseParser along hte lines of...

Re: Solr index corrupt question

2014-10-31 Thread Erick Erickson
What version of Solr/Lucene? There have been some instances of index corruption, see the lucene/CHANGES.txt file that might account for it. This is something of a stab in the dark though. Because this is troubling... Best, Erick On Fri, Oct 31, 2014 at 7:57 AM, ku3ia dem...@gmail.com wrote:

Re: Only copy string up to certain character symbol?

2014-10-31 Thread Erick Erickson
In addition to Alexandre's comment, your index chain looks suspect: filter class=solr.EdgeNGramFilterFactory minGramSize=4 maxGramSize=15 side=front / charFilter class=solr.PatternReplaceCharFilterFactory pattern=(\/.+?$) replacement=/ So the pattern replace stuff happens on the grams,

Re: Missing Records

2014-10-31 Thread Erick Erickson
Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you anything. because it looks like you've optimized which purges any data associated with deleted docs, including the internal IDs which are the numDocs/maxDocs figures. So if there were deletions, we can't see any

Re: exporting to CSV with solrj

2014-10-31 Thread tedsolr
I think I'm getting the idea now. You either use the response writer via an HTTP call, or you write your own exporter. Thanks to everyone for their input. -- View this message in context: http://lucene.472066.n3.nabble.com/exporting-to-CSV-with-solrj-tp4166845p4166889.html Sent from the Solr -

Re: exporting to CSV with solrj

2014-10-31 Thread will martin
Why do you want to use CSV in SolrJ? Alexandre are you looking for a design gig. This kind of question really begs nothing but disdain. Commodity search exists, not matter what Paul Nelson writes and part of that problem is due to advanced users always rewriting the reqs and specs of less

Re: Ideas for debugging poor SolrCloud scalability

2014-10-31 Thread Peter Keegan
Regarding batch indexing: When I send batches of 1000 docs to a standalone Solr server, the log file reports (1000 adds) in LogUpdateProcessor. But when I send them to the leader of a replicated index, the leader log file reports much smaller numbers, usually (12 adds). Why do the batches appear

Re: Ideas for debugging poor SolrCloud scalability

2014-10-31 Thread Erick Erickson
Internally, the docs are batched up into smaller buckets (10 as I remember) and forwarded to the correct shard leader. I suspect that's what you're seeing. Erick On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan peterlkee...@gmail.com wrote: Regarding batch indexing: When I send batches of 1000

Re: Ideas for debugging poor SolrCloud scalability

2014-10-31 Thread Peter Keegan
Yes, I was inadvertently sending them to a replica. When I sent them to the leader, the leader reported (1000 adds) and the replica reported only 1 add per document. So, it looks like the leader forwards the batched jobs individually to the replicas. On Fri, Oct 31, 2014 at 3:26 PM, Erick

Re: Solr index corrupt question

2014-10-31 Thread ku3ia
Erick Erickson wrote What version of Solr/Lucene? First of all, was Lucene\Solr v.4.6, but later it was changed to Lucene\Solr 4.8. More later to the schema was added _root_ field and child doc support. Full data re-index on each change was not done. But not so long ago I had made an optimize to

Re: exporting to CSV with solrj

2014-10-31 Thread Alexandre Rafalovitch
On 31 October 2014 14:58, will martin wmartin...@gmail.com wrote: Why do you want to use CSV in SolrJ? Alexandre are you looking for a design gig. This kind of question really begs nothing but disdain. Nope. Not looking for a design gig. I give that advice away for free:

Re: exporting to CSV with solrj

2014-10-31 Thread Chris Hostetter
: Why do you want to use CSV in SolrJ? Alexandre are you looking for a It's a legitmate question - part of providing good community support is making sure we understand *why* users are asking how to do something, so we can give good advice on other solutions people might not even have

[ANNOUNCE] Apache Solr 4.10.2 released

2014-10-31 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: prefix length in fuzzy search solr 4.10.1

2014-10-31 Thread Jack Krupansky
No, but it is a reasonable request, as a global default, a collection-specific default, a request-specific default, and on an individual fuzzy term. -- Jack Krupansky -Original Message- From: elisabeth benoit Sent: Thursday, October 30, 2014 6:07 AM To: solr-user@lucene.apache.org

Re: exporting to CSV with solrj

2014-10-31 Thread Erick Erickson
@Will: I can't tell you how many times questions like Why do you want to use CSV in SolrJ? have lead to solutions different from what the original question might imply. It's a question I frequently ask in almost the exact same way; it's a perfectly legitimate question IMO. Best, Erick On Fri,

Consul instead of ZooKeeper anyone?

2014-10-31 Thread Greg Solovyev
I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud

Re: Consul instead of ZooKeeper anyone?

2014-10-31 Thread Walter Underwood
It looks like Consul solves a different problem than Zookeeper. Consul manages what servers are up and starts new ones as needed. Zookeeper doesn’t start servers, but does leader election when they fail. I don’t see any way that Consul could replace Zookeeper, but it could solve another part

How to update SOLR schema from continuous integration environment

2014-10-31 Thread Faisal Mansoor
Hi, How do people usually update Solr configuration files from continuous integration environment like TeamCity or Jenkins. We have multiple development and testing environments and use WebDeploy and AwsDeploy type of tools to remotely deploy code multiple times a day, to update solr I wrote a