Re: A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-13 Thread Erick Erickson
bq: Is my understanding about stored fields correct, that even if excluded from fl, the data on the disk for a given field would still be read as part of decompression.. Assuming any stored field (NOT docvalues) was read then this is, indeed, correct. To be pedantic about it, enough 16K blocks

Re: A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-13 Thread Alexandre Rafalovitch
On 13 January 2017 at 14:40, Shawn Heisey wrote: > What if there were a schema option that would skip docValue retrieval > for a field unless the fl parameter were to *explicitly* ask for that > field? With a typical wildcard value in fl, fields with this option > enabled

Re: A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-13 Thread Shawn Heisey
On 1/13/2017 1:02 PM, Erick Erickson wrote: > What about using the defaults in requestHandlers along with SOLR-3191 > to accomplish this? Let's say that there was an fl-exclusion > parameter. Now you'd be able to define an exclusion default that would > exclude your field(s) unless overridden in

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-13 Thread Shawn Heisey
On 1/13/2017 5:46 PM, Chetas Joshi wrote: > One of the things I have observed is: if I use the collection API to > create a replica for that shard, it does not complain about the config > which has been set to ReplicationFactor=1. If replication factor was > the issue as suggested by Shawn,

Re: Deleting a shard in solr 4.10.4

2017-01-13 Thread Rachid Bouacheria
Thank you so much Erik! On Fri, Jan 13, 2017 at 4:40 PM, Erick Erickson wrote: > Here's what I'd do > 1> create a new collection with a single shard > 2> use the MERGEINDEXES core admin API command to merge the indexes > from the old 2-shard collection > > That way you

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-13 Thread Chetas Joshi
Erick, I have not changed any config. I have autoaddReplica = true for individual collection config as well as the overall cluster config. Still, it does not add a replica when I decommission a node. Adding a replica is overseer's job. I looked at the logs of the overseer of the solrCloud but

Re: Deleting a shard in solr 4.10.4

2017-01-13 Thread Erick Erickson
Here's what I'd do 1> create a new collection with a single shard 2> use the MERGEINDEXES core admin API command to merge the indexes from the old 2-shard collection That way you have a chance to verify that the merged collection is OK before deleting the old 2-shard collection. On Fri, Jan 13,

Deleting a shard in solr 4.10.4

2017-01-13 Thread Rachid Bouacheria
Hi All, I have a collection that has 2 shards. And I am finding that the 2 shards are unnecessary. So I would like to delete one of the shard without losing its data. Illustration: Before : Collection has shard1 and Shard 2 After: Collection No shard but the data contains Shard 1 and Shard 2

Re: Large index recommendation

2017-01-13 Thread Toke Eskildsen
Joe Obernberger wrote: [3 billion docs / 16TB / 27 shards on HDFS times 3 for replication] > Each shard is then hosting about 610GBytes of index. The HDFS cache > size is very low at about 8GBytes. Suffice it to say, performance isn't > very good, but again, this

Re: A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-13 Thread Erick Erickson
What about using the defaults in requestHandlers along with SOLR-3191 to accomplish this? Let's say that there was an fl-exclusion parameter. Now you'd be able to define an exclusion default that would exclude your field(s) unless overridden in your request handler. This could be either a default

A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-13 Thread Shawn Heisey
I've got an idea for a feature that I think could be very useful. I'd like to get some community feedback about it, see whether it's worth opening an issue for discussion. First, some background info: As I understand it, the fact that stored fields are compressed means that even if a particular

Re: Large index recommendation

2017-01-13 Thread Erick Erickson
In any case, this is really "the sizing question" and generic answers are not reliable. Here's a long blog about why, but the net-net is "prototype and measure". Fortunately you can prototype with just a few nodes (I usually want at least 2 shards) and extrapolate reasonably well.

Re: Large index recommendation

2017-01-13 Thread Susheel Kumar
As per Scott@FullStory you shall see benefits with many smaller shards then few bigger. Also upgrading to Solr 6.2 would be better as there are many improvements done handling multiple shards. See below presentation

Large index recommendation

2017-01-13 Thread Joe Obernberger
Hi All - we've been experimenting with Solr Cloud 5.5.0 with a 27 shard (no replication - each shard runs on a physical host) cluster on top of HDFS. It currently just crossed 3 billion documents indexed with an index size of 16.1TBytes. In HDFS with 3x replication this takes up 48.2TBytes.

Re: equivalent of json.facet's "gap" keyword in /sql

2017-01-13 Thread Joel Bernstein
The time functions aren't supported in the SQL interface currently. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jan 13, 2017 at 10:44 AM, radha krishnan wrote: > Hi, > > can we write an SQL statement and use the /sql handler to get the > json.facet;s "gap"

Re: Trouble boosting a field

2017-01-13 Thread Tom Chiverton
Well, I've tried much larger values than 8, and it still doesn't seem to do the job ? For now, assume my users are searching for exact sub strings of a real title. Tom On 13/01/17 16:22, Walter Underwood wrote: I use a boost of 8 for title with no boost on the content. Both Infoseek and

Re: Trouble boosting a field

2017-01-13 Thread Walter Underwood
I use a boost of 8 for title with no boost on the content. Both Infoseek and Inktomi settled on the 8X boost, getting there with completely different methodologies. You might not want the title to completely trump the content. That causes some odd anomalies. If someone searches for “ice age

Re: Trouble boosting a field

2017-01-13 Thread Erick Erickson
Tom: The output is numbing, but add =true to your query and you'll see exactly what contributed to the score and why. Otherwise you're flying blind. Obviously something's trumping your boosting, but you can't pin down what without the numbers. You can get an overall sense of what's happening if

equivalent of json.facet's "gap" keyword in /sql

2017-01-13 Thread radha krishnan
Hi, can we write an SQL statement and use the /sql handler to get the json.facet;s "gap" functionality. Ex facet query : json.facet: { my_histogram: { type: range, field: i_timestamp, start: "2016-10-21T01:00:00Z", end: "2016-10-21T02:00:00Z", gap: "+1MINUTE", mincount: 0 } } Thanks,

Re: regarding extending classes in org.apache.solr.client.solrj.io.stream.metrics package

2017-01-13 Thread radha krishnan
Hi Scott, i have created a JIRA ticket ( https://issues.apache.org/jira/browse/SOLR-9962) . i will figure out the patch process. Thanks, Radhakrishnan D On Thu, Jan 12, 2017 at 8:57 AM, Scott Stults < sstu...@opensourceconnections.com> wrote: > Radhakrishnan, > > That would be an appropriate

Trouble boosting a field

2017-01-13 Thread Tom Chiverton
I have a few hundred documents with title and content fields. I want a match in title to trump matches in content. If I search for "connected vehicle" then a news article that has that in the content shouldn't be ranked higher than the page with that in the title is essentially what I want.

AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Thanks @Toke, for pointing out these options. I'll have a read about expungeDeletes. Sounds even more so, that having solr filter out 0-counts is a good idea and I should handle my use-case outside of solr. Thanks again, Sebastian On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote: >

AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Nice, thank you very much for your explanation! >> Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if

Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Toke Eskildsen
On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote: > the second search should have been this: http://localhost:8983/solr/w > emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0 > =json  > (or in other words, give me all documents having value "1" for field > "m_mediaType_s") > > Since this

Re: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Michael Kuhlmann
Then I don't understand your problem. Solr already does exactly what you want. Maybe the problem is different: I assume that there never was a value of "1" in the index, leading to your confusion. Solr returns all fields as facet result where there was some value at some time as long as the the

AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi Bill, Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero. Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because

Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread billnbell
Set mincount to 1 Bill Bell Sent from mobile > On Jan 13, 2017, at 7:19 AM, Sebastian Riemer wrote: > > Pardon me, > the second search should have been this: > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json > > (or in other words, give

AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Pardon me, the second search should have been this: http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json (or in other words, give me all documents having value "1" for field "m_mediaType_s") Since this search gives zero results, why is it included in the

FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi, Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json returns: "facet_counts":{ "facet_queries":{}, "facet_fields":{ "m_mediaType_s":[ "2",25561, "3",19027, "10",1966, "11",1705,

RE: Can't get spelling suggestions to work properly

2017-01-13 Thread jimi.hullegard
I just noticed why setting maxResultsForSuggest to a high value was not a good thing. Because now it show spelling suggestions even on correctly spelled words. I think, what I would need is the logic of SuggestMode. SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being

RE: Can't get spelling suggestions to work properly

2017-01-13 Thread jimi.hullegard
Hi Alessandro, Thanks for your explanation. It helped a lot. Although setting "spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I also had to set "spellcheck.alternativeTermCount". With that done, I now get suggestions when searching for 'mycet' (a misspelling of

Re: SolrCloud different score for same document on different replicas.

2017-01-13 Thread Morten Bøgeskov
On Thu, 5 Jan 2017 16:31:35 + Charlie Hull wrote: > On 05/01/2017 13:30, Morten Bøgeskov wrote: > > > > > > Hi. > > > > We've got a SolrCloud which is sharded and has a replication factor of > > 2. > > > > The 2 replicas of a shard may look like this: > > > > Num Docs: