Re: Confusing DocValues documentation

2017-12-21 Thread Erick Erickson
OK, last bit of the tutorial. bq: But that does not seem to be helping with sorting or faceting of any kind. This seems to be like a good way to speed up a stored field's retrieval. These are the same thing. I have two docs. I have to know how they sort. Therefore I need the value in the sort

Re: Confusing DocValues documentation

2017-12-21 Thread S G
Thank you Eric. I guess the biggest piece I was missing was the sort on a field other than the search field. Once you have filtered a list of documents and then you want to sort, the inverted index cannot be used for lookup. You just have doc-IDs which are values in inverted index, not the keys.

Re: Confusing DocValues documentation

2017-12-21 Thread Erick Erickson
Here's where you're going off the rails: "I can just look at the map-for-field-A" As I said before, you're totally right, all the information you need is there. But you're thinking of this as though speed weren't a premium when you say. "I can just look". Consider that there are single replicas

Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Fair enough. I'm actually using ManifoldCF to manage the indexing, and I see that they have a TIka Content Extraction transformer available, so I'll look into wiring that into my pipeline and see if that gets me the results I'm looking for. Thanks, Phil This message optimized for indexing by

Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Erick Erickson
bq: s there any way to get reasonable behavior using the ExtractingRequestHandler, or should I just dump that approach and plan to run Tika outside of Solr, and then send Solr the exact content I want? Actually, this is recommended for a bunch of reasons, so I'd just go there straightaway. Tika

Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Hi all, I have been having an issue with Solr, using the ExtractingRequestHandler. Basically, when indexing a PDF (for example) I get all the metadata mixed into the "content" field along with the content. See:

Re: Confusing DocValues documentation

2017-12-21 Thread S G
Thanks a lot Erick and Emir. I am still a bit confused and an example will help me a lot. Here is a little bit modified version of the same to illustrate my point more clearly. Let us consider 3 documents - doc1, doc2 and doc3 Each contains upto 3 fields - A, B and C. And the values for these

Create collection error in 5.5.4

2017-12-21 Thread tedsolr
I upgraded Solr from 5.2.1 to 5.5.4 recently. Occasionally when creating a new collection via the Collections API I get an error: Could not fully create collection: . This has never happened previous to this upgrade. It's happened twice in my development environment and once in my user test

Re: Filtering Solr pivot facet values

2017-12-21 Thread Emir Arnautović
It seems that there is something in latest Solr version that you might be able to use. From release notes: “The new facet.matches parameter returns facet buckets only for terms that match a regular expression.” HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Solr: edismax boost not working

2017-12-21 Thread Emir Arnautović
Hi Ruby, Just add fields you want to search in qf and set boost for it. And yes - it can be defined in search handler params - you can set defaults or even use invariants to ignore values passed in url. Edismax has quite a few parameters and you should check docs to see how to use them in your

Re: Confusing DocValues documentation

2017-12-21 Thread Erick Erickson
bq: I do not see why sorting or faceting on any field A, B or C would be a problem. All the values for a field are there in one data-structure and it should be easy to sort or group-by on that. This is totally true just totally incomplete: ;) for a given field: Inverted structure (leaving out

Re: Confusing DocValues documentation

2017-12-21 Thread Emir Arnautović
Hi SG, It is all ok - it’s just that notation is different. Please see inline comments. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 21 Dec 2017, at 18:56, S G

SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use timeAllowed parameter that was fixed in Solr 5. We check this parameter in test servers and we don't understand if it works with group=true or not. * If we set group=false and timeAllowed=1 and query with too many

SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use timeAllowed parameter that was fixed in Solr 5. We check this parameter in test servers and we don't understand if it works with group=true or not. * If we set group=false and timeAllowed=1 and query with too many

Re: Solr: edismax boost not working

2017-12-21 Thread ruby
Thanks that worked. But now if I want to search against 3 fields, can that be defined in solrconfig.xml? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Confusing DocValues documentation

2017-12-21 Thread S G
Hi, It seems that docValues are not really explained well anywhere. Here are 2 links that try to explain it: 1) https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/ 2) https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html And official Solr documentation that

Re: Solr: edismax boost not working

2017-12-21 Thread Erick Erickson
bq: I want to boost to show the document having id=2. but my query is no working. Exactly _how_ is it not working? Boosting just influences position, it doesn't impose an absolute order. It's quite possible that you're successfully changing the score, just not enough to show. Use the debug=all,

Re: Filtering Solr pivot facet values

2017-12-21 Thread Erick Erickson
You might be able to do some interesting with the JSON faceting approach, but I confess I don't know for sure. Best, Erick On Thu, Dec 21, 2017 at 8:17 AM, Shawn Heisey wrote: > On 12/20/2017 2:40 PM, Arun Rangarajan wrote: >> >> I think multi-select faceting does the

[ANNOUNCE] Apache Solr 7.2.0 released

2017-12-21 Thread Adrien Grand
21 December 2017, Apache Solr™ 7.2.0 available The Lucene PMC is pleased to announce the release of Apache Solr 7.2.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

Re: Filtering Solr pivot facet values

2017-12-21 Thread Shawn Heisey
On 12/20/2017 2:40 PM, Arun Rangarajan wrote: I think multi-select faceting does the opposite of what I want. I want the facet to include the filters. You don't have any filters to include or exclude. You would need fq parameters to use multi-select faceting. But as you say, it doesn't do

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Walter Underwood
You can find all the inflected forms that are in your index. Search for the root form, use highlighting to pull out matches, and collect them. It is a bother, but not that hard for a program to do. In the synonym file, you don’t need to list an inflected form of the synonym, because it will be

RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Markus Jelsma
Hello Steve, This is an example of a query-time analyzer that has the problem: Synonym file contains stemmed terms: traject,verbind A search for plural term 'trajecten' becomes +DisjunctionMaxQuery(((title_nl:trajecten

Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Shawn Heisey
On 12/20/2017 6:09 PM, S G wrote: One of our Solr users is trying to set docValues="true" for multivalued string fields and boolean-type fields. I am not sure what the performance impact of that would be. Can docValues negatively affect performance in any way? Adding to what Emir said: The

Re: Solr: edismax boost not working

2017-12-21 Thread Emir Arnautović
Hi, Can you try: =json&=edismax=on =video =object_desc_NGRAM^10 object_name^2 It is edismax that expands query to search in fields listed in qf. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Steve Rowe
Markus, I’m confused about exactly what operations you’re performing - could you provide your field type? In particular, I don’t understand why you can’t just rewrite the synonyms file entry word1 => word2 to: word1 => word1, word2 (Clearly I’m missing something about how stemming is

Solr: edismax boost not working

2017-12-21 Thread ruby
I'm playing with following query but can't get the *object_desc* field boosted correctly. Maybe someone can tell me what I'm missing here: =json&=edismax=on =object_name_NGRAM:video OR object_desc_NGRAM:video =object_desc_NGRAM^10 object_name^2 field definition:

RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Markus Jelsma
Hello Steve, Well, that is an interesting approach to the topic indeed. But i do not think it is possible to obtain a list of all inflected forms for all words that also have roots in some synonym file, the stemmers are not reversible. Any other ideas? Thanks, Markus -Original

Re: howto sum of terms of specific field in index

2017-12-21 Thread Emir Arnautović
Hi Bernd, > Shouldn't it be: > freq(doc2, fieldX:A) = 4 (A appears 4 times in doc 2) Yes - that’s how it should be. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 21 Dec 2017, at 10:35,

Re: howto sum of terms of specific field in index

2017-12-21 Thread Bernd Fehling
Hi Emir, thank you, thats it. But a question while reading the docs about sumTotalTermFreq from your link. Example in the docs: If doc1:(fieldX:A B C) and doc2:(fieldX:A A A A): ... freq(doc1, fieldX:A) = 4 (A appears 4 times in doc 2) Shouldn't it be: freq(doc2, fieldX:A)

Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Emir Arnautović
Hi SG, Doc values is another file to write so indexing performances will suffer. In theory, query performances will suffer because alternative is in memory structure (fieldCache and fieldValueCache). In practice, it will not because in memory structure requires larger heap, requires

Re: howto sum of terms of specific field in index

2017-12-21 Thread Emir Arnautović
HI Bernd, It seems to me that you are looking for sumTotalTermFreq function. https://lucene.apache.org/solr/guide/6_6/function-queries.html HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

howto sum of terms of specific field in index

2017-12-21 Thread Bernd Fehling
Hi list, actually a simple question, but somehow i can't figure out how to get the total number of terms in a field in the index, example: record_1: fruit: apple, banana, cherry record_2: fruit: apple, pineapple, cherry record_3: fruit: kiwi, pineapple record_4: fruit: - a search for fruit:*

Re: App Studio

2017-12-21 Thread Naga
Pls send me . Thanks , Naga > On Nov 1, 2017, at 4:46 PM, Kojo wrote: > > I would like to try that! > > > Em 1 de nov de 2017 18:04, "Will Hayes" escreveu: > > There is a community edition of App Studio for Solr and Elasticsearch being > released by