SolrCloud never fully recovers after slow disks

2013-11-05 Thread Henrik Ossipoff Hansen
I previously made a post on this, but have since narrowed down the issue and am now giving this another try, with another spin to it. We are running a 4 node setup (over Tomcat7) with a 3-ensemble external ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM is using our

Re: The first search is slow

2013-11-05 Thread michael.boom
First time you run a query it's always slower, because it reads data from disk. After the first query, caches are built and stored in RAM memory, so the second run of that query will hit caches and be sensibly faster. To change how slow the first query is, play around with you firstSearcher and

Using all SolrCloud servers in round-robin setup

2013-11-05 Thread Eric Bus
Hi, I'm currently using a SolrCloud setup with 3 nodes. The setup hosts about 50 (small) collections of a few thousand documents each. In the past, I've used collections with replicationFactor = 3. So each node has a replica of all the collections. But now I want to add an extra node. Now,

Re: Using all SolrCloud servers in round-robin setup

2013-11-05 Thread Anshum Gupta
Hi Eric, You can use the CloudSolrServer which is zk aware and does a reasonable amount of intelligent stuff for you. http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html All it takes is the zk host address so you would not have to worry about

Re: 2 replicas with different num of documents

2013-11-05 Thread Yago Riveiro
All documents have been indexed with Solr 4.5.1. Indeed in the process, some replicas died (replication error result of java heap memory errors) the recovery process never ended and I need to restart the node that store the replica in recovery mode. The leader of the shard is the replica with

Re: Core admin: create new core

2013-11-05 Thread Bram Van Dam
On 11/04/2013 04:06 PM, Bill Bell wrote: You could pre create a bunch of directories and base configs. Create as needed. Then use schema less API to set it up ... Or make changes in a script and reload the core.. I ended up creating a little API that takes schema/config as input, creates

Re: Performance of rows and start parameters

2013-11-05 Thread michael.boom
Thank you! I suspect that maybe my box was too small. I'm upgrading my machines to more CPU RAM and let's see how it goes from there. Would limiting the number of returned fields to a smaller value would make any improvement? The behaviour I noticed was that: at start=orows=10 avg qtime after

Please add me to WIKI contributors group

2013-11-05 Thread Jayson Minard
Hi, so that I can edit the Solr WIKI (I previously could years ago, but not now) ... please add me to the contributors group. Thanks! username: JaysonMinard -- jayson

ANNOUNCE: Stump The Chump @ Lucene Revolution EU - Tommorrow

2013-11-05 Thread Chris Hostetter
(Note: cross posted announcement, please confine any replies to solr-user) Hey folks, On Wednesday, I'll be doing a Stump The Chump session at Lucene Revolution EU in Dublin Ireland. http://lucenerevolution.org/stump-the-chump If you aren't familiar with Stump The Chump it is a QA style

SolrCloud statistics

2013-11-05 Thread michael.boom
Solr's queryhandler statistics are pretty neat. Avg time per req, avg requests in the last 5/15 min and so on. But, when using SolrCloud's distributed search each core gets multiple requests, making it hard to check which is the actual query time (the time from when a leader gets the query request

Re: Performance of rows and start parameters

2013-11-05 Thread Raymond Wiker
Are you restricting the set of fields that you return from the queries? If not, it could be that you are returning fields that are potentially very large, and may affect query performance that way. On Tue, Nov 5, 2013 at 11:38 AM, michael.boom my_sky...@yahoo.com wrote: Thank you! I suspect

Re: Please add me to WIKI contributors group

2013-11-05 Thread Erick Erickson
Done. We had some problems with bots creating bogus pages so unfortunately we had to lock the Wiki down. But the bar to editing it is low, just ask so we're sure it's a real person :). Thanks for contributing! Erick On Tue, Nov 5, 2013 at 5:53 AM, Jayson Minard jayson.min...@gmail.comwrote:

Re: Performance of rows and start parameters

2013-11-05 Thread Erick Erickson
As long as start=0, this is _not_ the deep paging problem. Raymond's comments are well taken. Try restricting the returned fields to only id. If you have large fields, Solr 4.1+ automatically compresses the data so you might be seeing lots of time spent in decompression, that'd be my first guess.

Re: Facet question: Getting only the matched value from multivalued field

2013-11-05 Thread Erick Erickson
There's no good way to do that that I know of. The problem is that faceting occurs at an individual token level. If you use an un-analyzed field, then you get facets that are the entire value, but then you don't match non-exact, i.e. the Ronald Wagner would not match Wagner, Ronald S MD. I don't

Re: Problem of facet on 170M documents

2013-11-05 Thread Erick Erickson
You're just going to have to accept it being slow. Think of it this way: you have 4M (say) buckets that have to be counted into. Then the top 500 have to be collected to return. That's just going to take some time unless you have very beefy machines. I'd _really_ back up and consider whether this

Re: ANNOUNCE: Stump The Chump @ Lucene Revolution EU - Tommorrow

2013-11-05 Thread Erick Erickson
It's great fun to watch Chris squirm in front of hundreds of people, I highly recommend it! Unfortunately, it's really hard to stump him entirely! Erick On Tue, Nov 5, 2013 at 6:02 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: (Note: cross posted announcement, please confine any

Re: Core admin: create new core

2013-11-05 Thread Erick Erickson
If you're really bored, anything you'd like to do for SOLR-4779 would be great! Best, Erick On Tue, Nov 5, 2013 at 5:15 AM, Bram Van Dam bram.van...@intix.eu wrote: On 11/04/2013 04:06 PM, Bill Bell wrote: You could pre create a bunch of directories and base configs. Create as needed. Then

Re: Performance of rows and start parameters

2013-11-05 Thread Michael Della Bitta
Whoops, looks like I misdiagnosed this one. Just to add: you might want to make sure lazy field loading is enabled, too. On Nov 5, 2013 7:21 AM, Erick Erickson erickerick...@gmail.com wrote: As long as start=0, this is _not_ the deep paging problem. Raymond's comments are well taken. Try

Re: Solr 1.4 - Performance Issues

2013-11-05 Thread Erick Erickson
1.4 is ancient, but you know that already :) Anyway, what are your autocommit settings? That vintage of Solr blocks indexing when committing which may include rewriting the entire index. So part of your regular slowdown is likely segment merging happening with the commit. The 14 hour cycle is

Re: Facet question: Getting only the matched value from multivalued field

2013-11-05 Thread Raymond Wiker
We have a somewhat similar case; what we will do is to have one analysed field in conjunction with a string field (possibly with case folding). That way, we can use the original field values for displaying as facets, but also allow searches for parts of the facet values. On Tue, Nov 5, 2013 at

[ANN] Lux release 0.11.2

2013-11-05 Thread Michael Sokolov
I'm pleased to announce the release of Lux, version 0.11.2, the Dublin edition. There have been the usual round of bug fixes and enhancements, but the main news with this release is the inclusion of support for SolrCloud. You can now store and search XML documents in a distributed index

Nested documents/Block Join query

2013-11-05 Thread danosth
Hi I've been trying to play around with block join queries in the Solr 4.5 release and I was wondering if anyone else has any experience doing this? Basically I'm trying to create a parent-child-grandchild structure and then query to retrieve the parent documents. I can kinda get it to work on

Replication: slow first query after replication.

2013-11-05 Thread Luis Cappa Banda
Hi guys! I have a master-slave replication (Solr 4.1 version) with a 30 seconds polling interval and continuously new documents are indexed, so after 30 seconds always new data must be replicated. My test index is not huge: just 5M documents. I have experimented that a simple q=*:* query appears

Re: Replication: slow first query after replication.

2013-11-05 Thread Luis Cappa Banda
Against -- again, :-) 2013/11/5 Luis Cappa Banda luisca...@gmail.com Hi guys! I have a master-slave replication (Solr 4.1 version) with a 30 seconds polling interval and continuously new documents are indexed, so after 30 seconds always new data must be replicated. My test index is not

Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
Hi, I have been searching for an example of joins using solr/lucene. But I have not found anything either on the net or in the src/examples. Can someone please point me to the same? Ideally, I need a join working with Solrj APIs (Please let me know if this group is Lucene-specific). Best

Re: Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
I think Solr has the ability to do joins in the latest version as verified on this issue: https://issues.apache.org/jira/browse/SOLR-3076 And some online resources point to this example: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html However, I am not sure if the

Re: Problem of facet on 170M documents

2013-11-05 Thread Fudong-gmail
One way to solve the issue may be to create another field to group the value in a range, so you have fewer facet values to query. Sent from my iPhone On Nov 5, 2013, at 4:31 AM, Erick Erickson erickerick...@gmail.com wrote: You're just going to have to accept it being slow. Think of it this

Re: Can't find some fields in solr result

2013-11-05 Thread gohome190
That was it, I had to restart Solr for the schema changes to take effect -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-find-some-fields-in-solr-result-tp4099245p4099446.html Sent from the Solr - User mailing list archive at Nabble.com.

geo/spatial search performance comparison using different methods

2013-11-05 Thread T. Kuro Kurosaka
Are there any performance comparison results available comparing various methods to sort result by distance (not just filtering) on Solr 3 and 4? We are using Solr 3.5 with Solr-2155 patch. I am particularly interested in learning performance difference among Solr 3 LatLongType, Solr-2155

Re: Example of join using Solr/Lucene

2013-11-05 Thread Alvaro Cabrerizo
In my case, everytime I've used joins, the FROM field was a multivalued string and the TO was an univalued string. Regards. El 05/11/2013 18:37, Tech Id tech.login@gmail.com escribió: I think Solr has the ability to do joins in the latest version as verified on this issue:

Transformer also affects the column value when used in a subentity query?

2013-11-05 Thread Lixo Aqui
When you use a Transformer on a entity query does it also transform the value if it's needed in a subentity query? Considering the next example, the value ${outer.id} that is used in the inner entity is affectd by the ClobTransformer?(ignore the fact that it doesn't make sense to use a

Oracle 'raw' column data type is supported?

2013-11-05 Thread Lixo Aqui
I'm using Solr 1.4.1 and I have a table where the join colum is Oracle RAW.Consider the next example and that id is of ORACLE RAW type. Does Solr support this? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/dbname

DateDiff

2013-11-05 Thread Adam Harris
Hey All, Using solr I want to get the difference between two dates, is this possible? Something similar to SELECT DateDiff(d, GetDate(), date_Field) as Diff FROM MyTable

Re: character encoding issue...

2013-11-05 Thread T. Kuro Kurosaka
It sounds like the characters were mishandled at index build time. I would use Luke to see if a character that appear correctly when you change the output to be SHIFT JIS is actually stored as one Unicode. I bet it's stored as two characters, each having the character of the value that happened

facet.missing performance

2013-11-05 Thread andres
Hi All, Does setting 'facet.missing' to 'true' have any performance impact? Andres -- View this message in context: http://lucene.472066.n3.nabble.com/facet-missing-performance-tp4099477.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replication: slow first query after replication.

2013-11-05 Thread Shawn Heisey
On 11/5/2013 10:16 AM, Luis Cappa Banda wrote: I have a master-slave replication (Solr 4.1 version) with a 30 seconds polling interval and continuously new documents are indexed, so after 30 seconds always new data must be replicated. My test index is not huge: just 5M documents. I have

fq efficiency

2013-11-05 Thread Scott Schneider
Hi all, I'm wondering if filter queries are efficient enough for my use cases. I have lots and lots of users in a big, multi-tenant, sharded index. To run a search, I can use an fq on the user id and pass in the search terms. Does this scale well with the # users? I suppose that, since

Re: Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
Hi Alvaro, Could you please point me to some link from where I can see how to index two documents separately (joined by foreign keys). Or if you can oblige by putting down some details here itself. *For example*, say if user has entities like : car {id:5, color:red, year:2004, companyId:23,

solr sort facets by name

2013-11-05 Thread PeterKerk
By default solr sorts facets by the amount of hits for each result. However, I want to sort by facetnames alphabetically. Earlier I sorted the facets on the client or via my .NET code, however, this time I need solr to return the results with alphabetically sorted facets directly. How? -- View

get min-max prices as facets

2013-11-05 Thread PeterKerk
I want to provide my visitors with a price range slider when they can easily filter on a min and max value. For this I need to know the lowest and highest price of all the products found by my solr query.Since this is dynamic, based on the query by the user, I can not simply get the min and max

Configuring number or shards

2013-11-05 Thread Mark
Can you configure the number of shards per collection or is this a system wide setting affecting all collections/indexes? Thanks

Re: fq efficiency

2013-11-05 Thread Shawn Heisey
On 11/5/2013 3:36 PM, Scott Schneider wrote: I'm wondering if filter queries are efficient enough for my use cases. I have lots and lots of users in a big, multi-tenant, sharded index. To run a search, I can use an fq on the user id and pass in the search terms. Does this scale well with

Re: Configuring number or shards

2013-11-05 Thread Shawn Heisey
On 11/5/2013 5:14 PM, Mark wrote: Can you configure the number of shards per collection or is this a system wide setting affecting all collections/indexes? The collections API has a CREATE action. You can specify numShards as a parameter. This is better that defining numShards as a java

Re: Lots of tlog files remained, why?

2013-11-05 Thread Floyd Wu
Hi Eric, Sorry for replay being late. The tlog file stay there for one week and seems no decease. Most of them are 3~5 MB and totally 40MB. The article your point I've read many times but no working. Everytime I reindex files solr generate many tlog of them and no matter how many hard commit I

Re: solr sort facets by name

2013-11-05 Thread Koji Sekiguchi
(13/11/06 9:00), PeterKerk wrote: By default solr sorts facets by the amount of hits for each result. However, I want to sort by facetnames alphabetically. Earlier I sorted the facets on the client or via my .NET code, however, this time I need solr to return the results with alphabetically

Re: solr sort facets by name

2013-11-05 Thread manju16832003
Yes it is facet.sort=index would return facet result set in alphabetical order -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sort-facets-by-name-tp4099499p4099522.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: get min-max prices as facets

2013-11-05 Thread manju16832003
I'm not sure if my answer would help you :-). Usually we do not need to know the min and max prices that our current database or solr is holding for all the products. Even if you were to that, would be complex and just make your business logic bigger and tedious. Instead, we would know what is

Re: solr 4.3 solrj generating search terms that return no results

2013-11-05 Thread dboychuck
I'm having the same issue with solrJ 4.5.1 If I use the escapeQueryChars() function on a string like a b c it is escaping it to a\+b\+c which returns 0 results using edismax query parser. However a b c returns results. -- View this message in context:

Re: solr 4.3 solrj generating search terms that return no results

2013-11-05 Thread Shawn Heisey
On 11/5/2013 9:41 PM, dboychuck wrote: I'm having the same issue with solrJ 4.5.1 If I use the escapeQueryChars() function on a string like a b c it is escaping it to a\+b\+c which returns 0 results using edismax query parser. However a b c returns results. A space is a special character to

Re: solr 4.3 solrj generating search terms that return no results

2013-11-05 Thread Shawn Heisey
On 11/5/2013 10:22 PM, Shawn Heisey wrote: If you do not want the *entire* string treated as a single term for the query parser, then you cannot use escapeQueryChars. You'll need to write your own code that is aware of the specific special characters that you want to escape. If your query is

Re: Replication: slow first query after replication.

2013-11-05 Thread Luis Cappa
Hello, Shawn! I have seen that when disabling replication and executing queries the time responses are good. Interesting... I can't ser the solution, then, because slow replication tomes are needed to almost always get 'fresh' documents in slaves to search by, but this appareantly slows down