Re: How to handle multiple sub second updates to same SOLR Document

2014-01-28 Thread Bram Van Dam
On 01/25/2014 07:21 PM, christopher palm wrote: The problem I am trying to solve is that the order of these updates isn’t guaranteed once the multi threaded SOLRJ client starts sending them to SOLR, and older updates are overlaying the newer updates on the same document. Don't do that. There

Re: Fuzzy 2 search results wrong

2014-01-28 Thread lboutros
You have to create your own parser which extends the current query parser. You have to override the newFuzzyQuery protected function to call the FuzzyQuery constructor with a configured maximum expansion value or something like that. Ludovic. - Jouve France. -- View this message in

RE: Solr Related Search Suggestions

2014-01-28 Thread Markus Jelsma
Query Recommendations using Query Logs in Search Engines http://personales.dcc.uchile.cl/~churtado/clustwebLNCS.pdf Very interesting paper and section 2.1 covers related work plus references. In our first attempt we did it even simpler, by finding for each query other top queries by inspecting

Re: Complication - can block joins help?

2014-01-28 Thread Mikhail Khludnev
On Mon, Jan 27, 2014 at 9:30 AM, William Bell billnb...@gmail.com wrote: OK, In order to do boosting, we often will create a dynamic field in SOLR. For example: A Professional hire out for work, I want to boost those who do woodworking. George Smith - builds chairs, and builds desks. He

Where is a canonical SolrJ example(s)?

2014-01-28 Thread Alexandre Rafalovitch
Hello, I am writing a mini-client app and realized that I can't find a canonical example of how to write a SolrJ-based client. Given that it is the recommended API, I thought we had one in the distribution. But I can't seem to find it. I also checked -

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Shawn Heisey
On 1/28/2014 8:01 AM, Alexandre Rafalovitch wrote: What am I missing? Should I be looking at test classes: https://github.com/apache/lucene-solr/tree/trunk/solr/solrj/src/test/org/apache/solr/client/solrj ? They seem to be a little specialized to be reversing engineering basic examples from.

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Alexandre Rafalovitch
On Tue, Jan 28, 2014 at 10:27 PM, Shawn Heisey s...@elyograg.org wrote: What would you like to see in an example? I know a fair amount about SolrJ, and we have access to lots of smart people here, so how about we discuss what you think is missing? We can come up with something that will

implement relevency

2014-01-28 Thread rashmi maheshwari
Hi, How to get most relevent items on top of search results using solr search? -- Rashmi Be the change that you want to see in this world!

Re: implement relevency

2014-01-28 Thread Alexandre Rafalovitch
That's a catch-22 question and the most difficult one. How do you know they are relevant? Do you have a current base for comparison? Usually, you do the most basic thing. Then, you watch for what users are searching for and what they get. You figure out why something is not right, look at your

Solr Nutch

2014-01-28 Thread rashmi maheshwari
Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when we need multiple cores? Question 3: When do we go for cloud? What is

Synonyms and spellings

2014-01-28 Thread rashmi maheshwari
Hi, Questions 1) Why do we use Spellings file under solr core conf folder? What spellings do we enter in this? Question 2) : Implementing all synonyms is a tough thing. From where could i get list of as many synonyms as we could see in google search? -- Rashmi Be the change that you want

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Erick Erickson
There are always tests that are always current in the junit tests. Admittedly, they aren't nice, stand-alone tests, but they do the things that you are referencing. And they're always up to date since the tests are always running. Look particularly at SolrExampleTests.java and SolrQueryTest.java.

Re: Fwd: Search Engine Framework decision

2014-01-28 Thread rashmi maheshwari
Thanks saurish. My office *intranet *is a sharepoint website. When I am crawling it using nutch, i am getting Unauthorized access(404) error. NTLM realm is used in this website. I checked on one nutch JIRA link that sharepoint could be accessed using nutch. Nutch has below properties in

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Alexandre Rafalovitch
Erick, I think we are looking at this from different points of view. You seem to be saying that the tests work therefore somebody could reverse engineer the correct usage from it. I am saying that - for a beginner - it would be useful to have a minimum working example that shows a critical path

Re: Solr Nutch

2014-01-28 Thread Jack Krupansky
1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. 2. Think of a core as an SQL table - each table/core has a different type of data. 3. SolrCloud is all about scaling and availability - multiple shards for larger collections and multiple replicas

Re: Solr Nutch

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to achieve large-scale crawling using multiple nodes, it fetch the content of the HTML file, and yes it also parse its content. Q2: In our case we use sold to crawl some website, store the content in one “main” solr

Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
1) Plus, those files are binaries sometimes with metadata, specific crawlers need to understand them. html is a plain text 2) Yes, different data schemes. Sometimes I replicate the same core and make some A-B tests with different weights, filters etc etc and some people like to creare CoreA and

Re: implement relevency

2014-01-28 Thread tamanjit.bin...@yahoo.co.in
You may also want to look here http://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.nabble.com/implement-relevency-tp4113964p4113983.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms and spellings

2014-01-28 Thread Alexei Martchenko
2) There are some synonym lists on the web, they aren't always complete but I keep analyzing fields and tokens in order to polish my synonyms. And I like to use tools like http://www.visualthesaurus.com/ to aid me. Hope this helps :-) alexei martchenko Facebook

Dead node, but clusterstate.json says active, won't sync on restart

2014-01-28 Thread Greg Preston
** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it resync from the other replica. But when I restart

Re: Solr Nutch

2014-01-28 Thread rashmi maheshwari
Thanks All for quick response. Today I crawled a webpage using nutch. This page have many links. But all anchor tags have href=# and javascript is written on onClick event of each anchor tag to open a new page. So crawler didnt crawl any of those links which were opening using onClick event and

Re: Synonyms and spellings

2014-01-28 Thread rashmi maheshwari
Thanks for quick response Alexei. I will check this link to prepare synonym list. On Tue, Jan 28, 2014 at 11:00 PM, Alexei Martchenko ale...@martchenko.com.br wrote: 2) There are some synonym lists on the web, they aren't always complete but I keep analyzing fields and tokens in order to

Re: Dead node, but clusterstate.json says active, won't sync on restart

2014-01-28 Thread Shawn Heisey
On 1/28/2014 10:31 AM, Greg Preston wrote: ** Using solrcloud 4.4.0 ** I had to kill a running solrcloud node. There is still a replica for that shard, so everything is functional. We've done some indexing while the node was killed. I'd like to bring back up the downed node and have it

Re: Dead node, but clusterstate.json says active, won't sync on restart

2014-01-28 Thread Greg Preston
Thanks for the idea. I tried it, and the state for the bad node, even after an orderly shutdown, is still active in clusterstate.json. I see this in the logs on restart: [28 Jan 2014 18:25:29] [RecoveryThread] ERROR (org.apache.solr.common.SolrException) - Error while trying to recover.

Re: Solr Nutch

2014-01-28 Thread Markus Jelsma
Short answer, you can't.rashmi maheshwari maheshwari.ras...@gmail.com schreef:Thanks All for quick response. Today I crawled a webpage using nutch. This page have many links. But all anchor tags have href=# and javascript is written on onClick event of each anchor tag to open a new page. So

Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
Well, not even Google parse those. I'm not sure about Nutch but in some crawlers (jSoup i believe) there's an option to try to get full URLs from plain text, so you can capture some urls in the form of someClickFunction(' http://www.someurl.com/whatever') or even if they are in the middle of some

PHP + Solr

2014-01-28 Thread Felipe Dantas de Souza Paiva
‎Hi Folks, I would like to know what is the best way to integrate PHP and Apache Solr. Until now I've found two options: 1) http://www.php.net/manual/en/intro.solr.php 2) http://www.solarium-project.org/ What do you guys say? Cheers, Felipe AVISO: A

Re: Solr Nutch

2014-01-28 Thread rashmi maheshwari
Thanks Markus and Alexei. On Wed, Jan 29, 2014 at 12:08 AM, Alexei Martchenko ale...@martchenko.com.br wrote: Well, not even Google parse those. I'm not sure about Nutch but in some crawlers (jSoup i believe) there's an option to try to get full URLs from plain text, so you can capture some

Re: Dead node, but clusterstate.json says active, won't sync on restart

2014-01-28 Thread Mark Miller
Sounds like a bug. 4.6.1 is out any minute - you might try that. There was a replication bug that may be involved. If you removed the tlog and index and restart it should resync, or something is really crazy. The clusterstate.json is a red herring. You have to merge the live nodes info with

RE: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread heaven
I have the same problem, please look at the image: http://lucene.472066.n3.nabble.com/file/n4114026/Screenshot_733.png And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an issue, there's a lot. RAID 10 (15000RPM rapid hdd). -- View this message in context:

[ANNOUNCE] Apache Solr 4.6.1 released.

2014-01-28 Thread Robert Muir
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.6.1Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: PHP + Solr

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
I’ve some experience using Solarium and have been great so far. In particular we use the NelmioSolariumBundle to integrate with Symfony2. Greetings! On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva cad_fpa...@uolinc.com wrote: ‎Hi Folks, I would like to know what is the best way

Re: Boosting documents by categorical preferences

2014-01-28 Thread Chris Hostetter
: The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll

Re: PHP + Solr

2014-01-28 Thread André Widhani
I have no experience with either as we have our own PHP layer to interface with Solr. If I started again from scratch today I would surely consider Solarium. The PHP extension seems outdated and no longer maintained. It says it works with Solr 3.1 and last update is from 2011. André 2014-01-28

Re: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread Otis Gospodnetic
Hi, Show us more graphs. Is the GC working hard? Any of the JVM mem pools at or near 100%? SPM for Solr is your friend for long term monitoring/alerting/trends, jconsole and visualvm for a quick look. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch

Re: implement relevency

2014-01-28 Thread Utkarsh Sengar
Hi Rashmi, Relevancy needs some kind of training data which can lead to a chicken and egg problem. If you dont have that training set, then you need to come up with it or train manually (provide some seed). Our existing search had 2 years worth clickstream data, i.e. we know if someone searches

AW: PHP + Solr

2014-01-28 Thread Alexander Golubowitsch
Hi Felipe, I would advise to go with Solarium here. The PHP PECL integration is outdated to a degree that renders it almost useless by now. It has a couple of bugs when fetching large result sets. Good luck, Alex -Ursprüngliche Nachricht- Von: Felipe Dantas de Souza Paiva

Required local configuration with ZK solr.xml?

2014-01-28 Thread Jeff Wartes
It was my hope that storing solr.xml would mean I could spin up a Solr node pointing it to a properly configured zookeeper ensamble, and that no further local configuration or knowledge would be necessary. However, I’m beginning to wonder if that’s sufficient. It’s looking like I may also

Re: Solr 4.4.0 on hadoop 2.2.0

2014-01-28 Thread kirankh7
Hi just configured HDFS + Solr when am starting solr am getting error message like this ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1 org.apache.solr.common.SolrException: Problem creating directory:

Re: save Solr 4.4.0 data on hadoop 2.2.0

2014-01-28 Thread kirankh7
Hi just configured HDFS + Solr when am starting solr am getting error message like this ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1 org.apache.solr.common.SolrException: Problem creating directory:

Use a field without predefining it it the schema

2014-01-28 Thread Hakim Benoudjit
Hi guys With the new version of solr (4.6), can I add a field to the index, knowing that this field doesnt appear(isnt predefined) in the schema? I ask this question because I ve seen an issue (on jira) related to this. Thanks!

solr scoring

2014-01-28 Thread Summer Shire
Hi All, I have a scenario where for example my document titles are 1. battle of the zords 2. joes battle 3. who was in that crazy battle. and if the search term is [battle] I want to penalize the second and third document score because battle was matched further down than in the first

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Michael Sokolov
On 01/28/2014 11:55 AM, Alexandre Rafalovitch wrote: As to ESS, like I mentioned, the classpath issue seem to be quite a challenge. Again, perhaps not something that shows up during the testing because the directory layout during testing is rather different from the end-user's layout. I'm not

Re: Use a field without predefining it it the schema

2014-01-28 Thread Steve Rowe
Hi Hakim, Check out the section of the Solr Reference Guide on modifying the schema via REST API: https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema Steve On Jan 28, 2014, at 5:00 PM, Hakim Benoudjit h.benoud...@gmail.com wrote: Hi guys With the new

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Alexandre Rafalovitch
Thanks Mike, Sounds like Maven approach worked, I haven't tried that. But without Maven, it is much harder. Like I said (in the second email of the thread) the relevant jars are all over the distribution including some inside the .war file. And the only way to figure it out is to run the client

Re: Use a field without predefining it it the schema

2014-01-28 Thread Alexandre Rafalovitch
There is an example in the distribution that shows how new fields are auto-defined. I think it is example-schemaless. The secret is in the UpdateRequestProcessor chain that does cleanup and auto-mapping. Plus - I guess - automatically generated schema. Just remember that once the field is added

Re: Solr Nutch

2014-01-28 Thread Koji Sekiguchi
1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. In addition, I think Nutch has PageRank-like scoring function as opposed to Lucene/Solr, those are based on vector space model scoring. koji --

Re: Solr server requirements for 100+ million documents

2014-01-28 Thread Jack Krupansky
Lucene and Solr work best if the full index can be cached in OS memory. Sure, Lucene/Solr does work properly once the index no longer fits, but performance will drop off. I would say that you could fit 100 million moderate-size documents on a single Solr server - provided that you give the OS

Re: Required local configuration with ZK solr.xml?

2014-01-28 Thread Chris Hostetter
Maybe i'm mising something, but everything you are describing sounds correct and working properly -- the disconnect between what i think is suppose to happen and what you seem to be expecting seems to be right arround here : essentially used empty directories for the solr home. Although

RE: Solr server requirements for 100+ million documents

2014-01-28 Thread Susheel Kumar
Thanks, Jack. That helps. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, January 28, 2014 8:01 PM To: solr-user@lucene.apache.org Subject: Re: Solr server requirements for 100+ million documents Lucene and Solr work best if the full index can be

Re: Solr server requirements for 100+ million documents

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Previously in the list a spreadsheet has been mentioned, taking into account that you already have documents in an index you could extract the needed information from your index and feed it into the spreadsheet and it probably will give you a rough approximated of the hardware you’ll bee