Solr 6.4.0 and deprecated SynonymFilterFactory

2017-02-02 Thread Ere Maijala

Hi,

on startup Solr 6.4.0 logs the following warning:

o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis 
class [solr.SynonymFilterFactory]. Please consult documentation how to 
replace it accordingly.


What documentation? As far as I can see, there's nothing at 
 
or
 
nor did a quick Google search come up with anything definitive.


Am I looking in the wrong places or does the mentioned documentation 
exist at all?


--Ere


Re: Arabic words search in solr

2017-02-02 Thread Steve Rowe
Hi Mohan,

I ran your Case #1 through Solr 4.9.0’s Admin UI Analysis pane and I can see 
the analyzer for the field type “text_ar" analyzer does not remove all 
diacritics:

Indexed original: المؤسسة التجارية العمانية
Indexed analyzed: مؤسس تجار عمان

Query original: الموسسة التجارية
Query analyzed: موسس تجار

The analyzed query terms are the same as the first two analyzed indexed terms, 
with one exception: the hamza on the waw in the analyzed indexed term “مؤسس” 
was not stripped off by the analyzer, and so won’t match the analyzed query 
term “موسس”, which was entered by the user without the hamza.

Adding ICUFoldingFilterFactory to the “text_ar” field type fixed case #1 for me 
by stripping the hamza from the waw.  You can read more about this filter in 
the Solr Reference Guide (yes, this is basically for Solr 6.4, but I don’t 
think this functionality has changed between 4.9 and 6.4): 
.
  If you do this, you can remove the LowerCaseFilterFactory since 
ICUFoldingFilterFactory performs lowercasing as part of its work.

Note that to use ICUFoldingFilterFactory you must add three jars to the lib/ 
directory in your solr home dir.  Here’s how I did it:

$ mkdir example/solr/lib
$ cp dist/solr-analysis-extras-4.9.0.jar example/solr/lib/
$ cp contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.9.0.jar 
example/solr/lib/
$ cp contrib/analysis-extras/lib/icu4j-53.1.jar example/solr/lib/

--
Steve
www.lucidworks.com 

> On Feb 1, 2017, at 6:50 AM, mohanmca01  wrote:
> 
> Dear Steve,Thanks for investigating our problem. Our project is basically
> business directory search platform, and we have more than 100+ K business
> details information. I’m providing you some examples of Arabic words to
> reproduce the problem. please find attached word file where i explained
> everything along with screenshots. arabicSearch.docx
>  
> regarding upgrading to the latest version, our project is running on Java
> 1.7V, and if i need to upgrade then we have to upgrade Java, Application
> Server JBoos, and etc. which is not that right time to do this activity at
> all..!!
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4318227.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread David Kramer
Yes, think of the starving orphan records…

Ours is an eCommerce system, selling mostly shoes.  We have three levels of 
nested objects representing what we sell:
- Product: Mostly title and description
- Item: A specific color and some other attributes, including price. Products 
have 1 or more Items, Items belong to one product.
- SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong to 
one Item.
[PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]

Products, items, and SKUs all have ID numbers. One product will never have the 
same ID as another product, but it’s possible for a product to have the same ID 
as an Item or a SKU. And that is the problem.  So the program that creates the 
import file adds a new field called uuid, that is a P, I, or S (for Product, 
Item, or SKU) followed by the ID.  We did it this way because my understanding 
is Solr can’t implement a compound unique key.  The uuid is unique across all 
documents, not just all documents of the same docType.

So in the case of my unique test to see if it would complain if the UUID of a 
document I was inserting was not unique, I grabbed the first few products from 
the full import file, and changed the IDs so they are not duplicates of the 
real data, but left the UUIDs alone, so they are duplicates of the real data, 
which was already loaded.  

My expectation was that when I loaded the data I would get some  error saying 
that UUID was already used.  YOUR expectation is that the record would be 
overwritten.  What actually happened is that the new documents got added with 
their duplicate UUIDs, which is the worst possible case.  This is why I think 
it’s not respecting my uniqueKey setting in schema.xml.

Does that make more sense?  I hope you can help me understand this discrepancy. 
Thanks for your efforts so far.

On 2/2/17, 3:13 PM, "Mikhail Khludnev"  wrote:

David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer 
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is I
> have no way of validating my schema.xml, or know whether Solr is even 
using
> it (I have evidence it’s not, more below). So for each test, I’ve wiped 
out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the docType
> plus the ID, and we put “uuid” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted Solr.  
In
> order to verify it was working, I created an import file that had unique
> IDs but UUIDs which were duplicates of existing records, and it imported
> the new records even though the UUIDs existed in the database already.  
I’m
> not sure if Solr should have produced an error or not. I’ll research that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev"  wrote:
>
> David,
>
> Can you make sure your index doesn't have deleted docs? This  can be
> seen
> in SolrAdmiun.
> And can you merge index to avoid having them in the index?
>
> On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> david.kra...@shoebuy.com>
> wrote:
>
> >
> >
> > Some background:
> > · The data involved is catalog data, with three nested
> objects:
> > Products, Items, and Skus, in that order. We have a docType field on
> each
> > record as a differentiator.
> > · The "id" field in our data is unique within datatype, but
> not
> > across datatypes. We added a "uuid" field in our program that
> generates the
> > Solr import file that is the id prefixed by the first letter of the
> > docType, like P12345. That makes the uuid field unique, and we have
> that as
> > the uniqueKey in our schema.xml.
> > · We are trying to retrieve the parent Product, and all
> children
> > documents. As such, we are using the ChildDocTransformerFactory
> > ([child...]) to retrieve the children along with the parent. We have
> not
> > yet solved the problem of getting items within SKUs as nested
> documents in
> > the results, and we will have to figure that out at some point, but
> for now
> > we get them flattened
> > · We 

Re: Solr 6 Facet range query over streaming API

2017-02-02 Thread Joel Bernstein
Currently you would have to supply some custom Streaming Expressions to do
the kind of range faceting you are doing above.

There are two aggregation Streaming Expressions currently:

rollup: Does a MapReduce rollup over buckets. This means all tuples
matching a query are sorted and partitioned across worker nodes and then
rolled up. This is good for exact facet counts on very high cardinality
fields. But it is much slower then traditional faceting on low and moderate
cardinality fields.

facet: Pushes the aggregations down into Solr using the JSON facet API.

Both of these currently support simple aggregations over buckets, but not
the kind of automatic date range faceting you're currently using.

Aggregations are going to be getting more attention in Streaming
Expressions soon, to support additional functionality Parallel SQL.





Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Feb 2, 2017 at 9:51 AM, yauza  wrote:

> While migrating from Solr legacy faceting to Solr streaming API, how to
> implement the below parameters:
>
>  = true
>  = {!key=age_ranges}age
>  = 0
>  = 100
>  = 10
>  = {!key=price_ranges}price
>  = 0
>  = 1000
>  = 50
>  = a_field
>  = 1
>
>
> How can I implement these parameters using the new facet streaming with
> parallel functions. I understand rollup() functionality has to be used.But
> the documentation is limited and I am not able to understand the underlying
> functionality. Can someone help me with this?
>
> Ref Sample Query being currently used:
>
> q=(score:[*+TO+*])+AND+(field:378/content)+AND+(crawl_date:[
> 2015-01-01T18:30:00.000Z+TO+2017-02-01T18:30:00.000Z])
> =score=project_score
> range=publish_date=0
> =1000=10&
> facet=true_score.facet.range.start=0.0
> _score.facet.range.end=1.0_score.facet.
> range.gap=0.01_date.facet.range.start=1970-01-01T00:00:00Z
> _date.facet.range.end=2017-02-01T14:09:26.763Z&
> f.publish_date.facet.range.gap=%2B1MONTH=0=project_score
> ={!key%3D378+facet.prefix%3D378/content}
> field=Tags=patent_publication
> =score=1000=1
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-6-Facet-range-query-over-streaming-API-tp4318477.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

2017-02-02 Thread Mikhail Khludnev
Have anybody tried to tweak AnalysingSuggester with ngram token filter to
expand such infix queries?

On Thu, Feb 2, 2017 at 6:55 PM, Shawn Heisey  wrote:

> On 2/2/2017 8:15 AM, Maciej Ł. PCSS wrote:
> > regardless of the value of such a use-case, there is another thing
> > that stays unknown for me.
> >
> > Does SOLR support a simple and silly 'exact substring match'? I mean,
> > is it possible to search for (actually filter by) a raw substring
> > without tokenization and without any kind of processing/simplifying
> > the searched information? By a 'raw substring' I mean a character
> > string that, among others, can contain non-letters (colons, brackets,
> > etc.) - basically everything the user is able to input via keyboard.
> >
> > Does this use case meet SOLR technical possibilities even if that
> > means a big efficiency cost?
>
> Because you want to do substring matches, things are somewhat more
> complicated than if you wanted to do a full exact-string-only query.
>
> First I'll tackle the full exact query idea, because the info is also
> important for substrings:
>
> If the class in the fieldType is "solr.StrField" then the input will be
> indexed exactly as it is sent, all characters preserved, and all
> characters needing to be in the query.
>
> On the query side, you would need to escape any special characters in
> the query string -- spaces, colons, and several other characters.
> Escaping is done with the backslash.  If you are manually constructing
> URL parameters for an HTTP request, you would also need to be aware of
> URL encoding.  Some Solr libraries (like SolrJ) are capable of handling
> all the URL encoding for you.
>
> Matching *substrings* with StrField would involve either a regular
> expression query (with .* before and after) or a wildcard query, which
> Erick described in his reply.
>
> An alternate way to do substring matches is the NGram or EdgeNGram
> filters, and not using wildcards or regex.  This method will increase
> your index size, possibly by a large amount.  To use this method, you'd
> need to switch back to solr.TextField, use the keyword tokenizer, and
> then follow that with the appropriate NGram filter.  Depending on your
> exact needs, you might only do the NGram filter on the index side, or
> you might need it on both index and query analysis.  Escaping special
> characters on the query side would still be required.
>
> The full list of characters that require escaping is at the end of this
> page:
>
> http://lucene.apache.org/core/6_4_0/queryparser/org/apache/
> lucene/queryparser/classic/package-summary.html?is-external=true#Escaping_
> Special_Characters
>
> Note that it shows && and || as special characters, even though these
> are in fact two characters each.  Typically even a single instance of
> these characters requires escaping.  Solr will also need spaces to be
> escaped.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread Mikhail Khludnev
David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer 
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is I
> have no way of validating my schema.xml, or know whether Solr is even using
> it (I have evidence it’s not, more below). So for each test, I’ve wiped out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the docType
> plus the ID, and we put “uuid” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted Solr.  In
> order to verify it was working, I created an import file that had unique
> IDs but UUIDs which were duplicates of existing records, and it imported
> the new records even though the UUIDs existed in the database already.  I’m
> not sure if Solr should have produced an error or not. I’ll research that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev"  wrote:
>
> David,
>
> Can you make sure your index doesn't have deleted docs? This  can be
> seen
> in SolrAdmiun.
> And can you merge index to avoid having them in the index?
>
> On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> david.kra...@shoebuy.com>
> wrote:
>
> >
> >
> > Some background:
> > · The data involved is catalog data, with three nested
> objects:
> > Products, Items, and Skus, in that order. We have a docType field on
> each
> > record as a differentiator.
> > · The "id" field in our data is unique within datatype, but
> not
> > across datatypes. We added a "uuid" field in our program that
> generates the
> > Solr import file that is the id prefixed by the first letter of the
> > docType, like P12345. That makes the uuid field unique, and we have
> that as
> > the uniqueKey in our schema.xml.
> > · We are trying to retrieve the parent Product, and all
> children
> > documents. As such, we are using the ChildDocTransformerFactory
> > ([child...]) to retrieve the children along with the parent. We have
> not
> > yet solved the problem of getting items within SKUs as nested
> documents in
> > the results, and we will have to figure that out at some point, but
> for now
> > we get them flattened
> > · We are building out the proof of concept for this. This is
> all
> > new work, so we are free to change a lot.
> > · This is Solr 6.0.0, and we are importing in JSON format,
> if that
> > matters
> > · I submitted this question to StackOverflow > stackoverflow.com/questions/41969353/solr-querying-nested-
> documents-with-
> > childdoctransformerfactory-get-parent-quer> but haven’t gotten any
> > answers yet.
> >
> >
> > Our data looks like this (I've removed some fields for simplicity):
> >
> > {
> >
> >   "id": 739063,
> >
> >   "docType": "Product",
> >
> >   "uuid": "P739063",
> >
> >   "_childDocuments_": [
> >
> > {
> >
> >   "id": 1537378,
> >
> >   "price": 25.45,
> >
> >   "color": "Blush",
> >
> >   "docType": "Item",
> >
> >   "productId": 739063,
> >
> >   "uuid": "I1537378",
> >
> >   "_childDocuments_": [
> >
> > {
> >
> >   "id": 12799578,
> >
> >   "size": "10",
> >
> >   "width": "W",
> >
> >   "docType": "Sku",
> >
> >   "itemId": 1537378,
> >
> >   "uuid": "S12799578"
> >
> > }
> >
> >   ]
> >
> > }
> >
> > }
> >
> >
> >
> > The query to fetch all Products and their children nested inside
> them is
> > q=docType:Product=title,id,docType,[child
> > parentFilter=docType:Product]. When I run that query, all is well,
> and it
> > returns the first 10 rows. However, if I fetch more rows by adding,
> say
> > =500, we get the error Parent query yields document which is not
> > matched by parents filter, docID=XXX.
> >
> > When we first saw that error, we discovered our id field was not
> unique
> > across document types, so we added the uuid field as mentioned
> above, which
> > is. we also added in our schema.xml file, wiped the core, recreated
> it, and
> > restarted Solr just to make sure it was in effect. We have double

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release
notes did not mention anything about format being changed so I thought it
would be backward compatible. Yeah my only recourse is to re-index data.
Apart from that it was weird problems overall with 6.4.0. I was excited
about using the unified highlighter but the zookeeper flakiness and
constant disconnections of solr and sometimes not electing a leader for
some collections made me rollback.

Anyway thanks for promptly responding, will be more careful form next time.

Thanks

Ravi Kiran Bhaskar



On Thu, Feb 2, 2017 at 9:41 AM, Shawn Heisey  wrote:

> On 2/2/2017 7:23 AM, Ravi Solr wrote:
> > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
> >
> > Could not load codec 'Lucene62'.  Did you forget to add
> > lucene-backward-codecs.jar?
> > at org.apache.lucene.index.SegmentInfos.readCodec(
> SegmentInfos.java:429)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
> >
> > Hope this doesnt cost me dearly. Any ideas at least on how to rollback
> > safely.
>
> This sounds like you did some indexing after the upgrade, or possibly
> some index optimizing, so the parts of the index that were written (or
> merged) by the newer version are now in a format that the older version
> cannot use.  Perhaps the merge policy was changed, causing Solr to do
> some automatic merges once it started up.  I am not aware of anything in
> Solr that would write new segments without indexing input or a merge
> policy change.
>
> As far as I know, there is no straightforward way to go backwards with
> the index format.  If you want to downgrade and don't have a backup of
> your indexes from before the upgrade, you'll probably need to wipe the
> index directory and completely reindex.
>
> Solr will always use the newest default index format for new segments
> when you upgrade.  Contrary to many user expectations, setting
> luceneMatchVersion will *NOT* affect the index format, only the behavior
> of components that do field analysis.
>
> Downgrading the index format would involve writing a custom Lucene
> program that changes the active index format to the older version, then
> runs a forceMerge on the index.  It would be completely separate from
> Solr, and definitely not straightforward.
>
> Thanks,
> Shawn
>
>


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Cliff Dickinson
Steve and Shawn, thanks for your replies/explanations!

I eagerly await the completion of the Solr JIRA ticket referenced above in
a future release.  Many thanks for addressing this challenge that has had
me banging my head against my desk off and on for the last couple years!

Cliff

On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe  wrote:

> Hi Cliff,
>
> The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> problem that prevents SynonymGraphFilter from working: the text fed to your
> query analyzer is first split on whitespace.  So e.g. a query containing
> “United States” will never match multi-word synonym “United States”->”US”,
> since the analyzer will fist see “United” and then, separately, “States”.
>
> I fixed the whitespace splitting problem in the classic Lucene query
> parser in .  (Note
> that this is *not* the same as Solr’s standard/“Lucene” query parser, which
> is actually a fork of Lucene’s query parser with added functionality.)
>
> There is a Solr JIRA I’m working on to fix the whitespace splitting
> problem: .  I hope to
> get it committed in time for inclusion in Solr 6.5.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> >
> > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> >> The SynonymGraphFilter API documentation contains the following
> statement
> >> at the end:
> >>
> >> "To get fully correct positional queries when your synonym replacements
> are
> >> multiple tokens, you should instead apply synonyms using this
> TokenFilter
> >> at query time and translate the resulting graph to a TermAutomatonQuery
> >> e.g. using TokenStreamToTermAutomatonQuery."
> >
> > Lucene is a programming API for search.  That documentation is intended
> > for people who are writing Lucene programs.  Those users would be
> > constructing query objects in their own code, so they would most likely
> > know exactly which object needs to be changed to TermAutomatonQuery.
> >
> > Solr is a Lucene program ... and an immensely complicated one.  Many
> > Lucene improvements require changes in the end program for full
> > support.  I suspect that Solr's capability has not been updated to use
> > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > is familiar with this Lucene change and Solr internals can comment.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Steve Rowe
Hi Cliff,

The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a problem 
that prevents SynonymGraphFilter from working: the text fed to your query 
analyzer is first split on whitespace.  So e.g. a query containing “United 
States” will never match multi-word synonym “United States”->”US”, since the 
analyzer will fist see “United” and then, separately, “States”.

I fixed the whitespace splitting problem in the classic Lucene query parser in 
.  (Note that this is *not* 
the same as Solr’s standard/“Lucene” query parser, which is actually a fork of 
Lucene’s query parser with added functionality.)

There is a Solr JIRA I’m working on to fix the whitespace splitting problem: 
.  I hope to get it committed 
in time for inclusion in Solr 6.5.

--
Steve
www.lucidworks.com

> On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> 
> On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
>> The SynonymGraphFilter API documentation contains the following statement
>> at the end:
>> 
>> "To get fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this TokenFilter
>> at query time and translate the resulting graph to a TermAutomatonQuery
>> e.g. using TokenStreamToTermAutomatonQuery."
> 
> Lucene is a programming API for search.  That documentation is intended
> for people who are writing Lucene programs.  Those users would be
> constructing query objects in their own code, so they would most likely
> know exactly which object needs to be changed to TermAutomatonQuery.
> 
> Solr is a Lucene program ... and an immensely complicated one.  Many
> Lucene improvements require changes in the end program for full
> support.  I suspect that Solr's capability has not been updated to use
> this new feature in Lucene.  I cannot say for sure, I hope someone who
> is familiar with this Lucene change and Solr internals can comment.
> 
> Thanks,
> Shawn
> 



Solr JOIN - Self join within core not working

2017-02-02 Thread Muhammad Zahid Iqbal
Hi,

I have a Solr instance setup with one core named as Book (contains total
100 records and a field isprimary:1 has 50 records)

When I querying, bring me accurate results (50), its simple I know:

http://localhost:7161/solr/Book/select?q=%3A=IsPrimary%3A1=10=xml=true

But When I query like this why its showing all records(100), I know I am in
the same core but why its behaving strange?:

http://localhost:7161/solr/Book/select?q=%3A=%7B!join+fromIndex%3DBook+from%3DGroupID+to%3DGroupID%7D%2BIsPrimary%3A1=10=xml=true

If Solr JOIN works like INNER QUERY this should not be the case.

Any valuable feedback will be appreciated!


Re: How long for autoAddReplica?

2017-02-02 Thread Kevin Risden
>
> so migrating by replacing nodes is going to be a bother.


Not sure what you mean by migrating and replacing nodes, but these two new
actions on the Collections API as of Solr 6.2 may be of use:

   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-REPLACENODE:MoveAllReplicasinaNodetoAnother
   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETENODE:DeleteReplicasinaNode



Kevin Risden

On Thu, Feb 2, 2017 at 11:46 AM, Erick Erickson 
wrote:

> bq: I don’t see a way to add replicas through the UI, so migrating by
> replacing nodes is going to be a bother
>
> There's a lot of improvements in the admin UI for SolrCloud that I'd
> love to see. Drag/drop replicas would be really cool for instance.
>
> At present though using
> ADDREPLICA/wait-for-new-replica-to-be-active/DELETEREPLICA through the
> collections API is what's available.
>
> Best,
> Erick
>
> On Thu, Feb 2, 2017 at 8:37 AM, Walter Underwood 
> wrote:
> > Oh, missed that limitation.
> >
> > Seems like something that would be very handy in all installations. I
> don’t see a way to add replicas through the UI, so migrating by replacing
> nodes is going to be a bother.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp 
> wrote:
> >>
> >> Hi,
> >>
> >> are you using HDFS? According to the documentation the feature should
> be only available if you are using HDFS. For me it did however also fail on
> that. See the thread "Solr on HDFS: AutoAddReplica does not add a replica"
> from about two weeks ago.
> >>
> >> regards,
> >> Hendrik
> >>
> >> On 02.02.2017 07:21, Walter Underwood wrote:
> >>> I added a new node an shut down a node with a shard replica on it. It
> has been an hour and I don’t see any activity toward making a new replica.
> >>>
> >>> The new node and the one I shut down are both 6.4. The rest of the
> 16-node cluster is 6.2.1.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>
> >>
> >
>


Re: How long for autoAddReplica?

2017-02-02 Thread Erick Erickson
bq: I don’t see a way to add replicas through the UI, so migrating by
replacing nodes is going to be a bother

There's a lot of improvements in the admin UI for SolrCloud that I'd
love to see. Drag/drop replicas would be really cool for instance.

At present though using
ADDREPLICA/wait-for-new-replica-to-be-active/DELETEREPLICA through the
collections API is what's available.

Best,
Erick

On Thu, Feb 2, 2017 at 8:37 AM, Walter Underwood  wrote:
> Oh, missed that limitation.
>
> Seems like something that would be very handy in all installations. I don’t 
> see a way to add replicas through the UI, so migrating by replacing nodes is 
> going to be a bother.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp  wrote:
>>
>> Hi,
>>
>> are you using HDFS? According to the documentation the feature should be 
>> only available if you are using HDFS. For me it did however also fail on 
>> that. See the thread "Solr on HDFS: AutoAddReplica does not add a replica" 
>> from about two weeks ago.
>>
>> regards,
>> Hendrik
>>
>> On 02.02.2017 07:21, Walter Underwood wrote:
>>> I added a new node an shut down a node with a shard replica on it. It has 
>>> been an hour and I don’t see any activity toward making a new replica.
>>>
>>> The new node and the one I shut down are both 6.4. The rest of the 16-node 
>>> cluster is 6.2.1.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>
>>
>


Re: How long for autoAddReplica?

2017-02-02 Thread Walter Underwood
Oh, missed that limitation.

Seems like something that would be very handy in all installations. I don’t see 
a way to add replicas through the UI, so migrating by replacing nodes is going 
to be a bother.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp  wrote:
> 
> Hi,
> 
> are you using HDFS? According to the documentation the feature should be only 
> available if you are using HDFS. For me it did however also fail on that. See 
> the thread "Solr on HDFS: AutoAddReplica does not add a replica" from about 
> two weeks ago.
> 
> regards,
> Hendrik
> 
> On 02.02.2017 07:21, Walter Underwood wrote:
>> I added a new node an shut down a node with a shard replica on it. It has 
>> been an hour and I don’t see any activity toward making a new replica.
>> 
>> The new node and the one I shut down are both 6.4. The rest of the 16-node 
>> cluster is 6.2.1.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>> 
> 



Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

2017-02-02 Thread Shawn Heisey
On 2/2/2017 8:15 AM, Maciej Ł. PCSS wrote:
> regardless of the value of such a use-case, there is another thing
> that stays unknown for me.
>
> Does SOLR support a simple and silly 'exact substring match'? I mean,
> is it possible to search for (actually filter by) a raw substring
> without tokenization and without any kind of processing/simplifying
> the searched information? By a 'raw substring' I mean a character
> string that, among others, can contain non-letters (colons, brackets,
> etc.) - basically everything the user is able to input via keyboard.
>
> Does this use case meet SOLR technical possibilities even if that
> means a big efficiency cost? 

Because you want to do substring matches, things are somewhat more
complicated than if you wanted to do a full exact-string-only query.

First I'll tackle the full exact query idea, because the info is also
important for substrings:

If the class in the fieldType is "solr.StrField" then the input will be
indexed exactly as it is sent, all characters preserved, and all
characters needing to be in the query.

On the query side, you would need to escape any special characters in
the query string -- spaces, colons, and several other characters. 
Escaping is done with the backslash.  If you are manually constructing
URL parameters for an HTTP request, you would also need to be aware of
URL encoding.  Some Solr libraries (like SolrJ) are capable of handling
all the URL encoding for you.

Matching *substrings* with StrField would involve either a regular
expression query (with .* before and after) or a wildcard query, which
Erick described in his reply.

An alternate way to do substring matches is the NGram or EdgeNGram
filters, and not using wildcards or regex.  This method will increase
your index size, possibly by a large amount.  To use this method, you'd
need to switch back to solr.TextField, use the keyword tokenizer, and
then follow that with the appropriate NGram filter.  Depending on your
exact needs, you might only do the NGram filter on the index side, or
you might need it on both index and query analysis.  Escaping special
characters on the query side would still be required.

The full list of characters that require escaping is at the end of this
page:

http://lucene.apache.org/core/6_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true#Escaping_Special_Characters

Note that it shows && and || as special characters, even though these
are in fact two characters each.  Typically even a single instance of
these characters requires escaping.  Solr will also need spaces to be
escaped.

Thanks,
Shawn



Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

2017-02-02 Thread Erick Erickson
Well, the *whatever* syntax will work.
(that's asterisk-whatever-asterisk if funky bolding happens). You'd
use it on a "string" field (unanalyzed, case sensitive) or perhaps on
some field with KeywordTokenizerFactory possibly followed by
LowercaseFilterFactory if you wanted case-insensitive matches. I think
you have to enable leading wildcards as well.

There's some trickiness getting all this past the query _parser_
though at query time and URL-encoding the odd characters may be
required. There's  been some recent work done to get spaces through
the query arsing step, but in any case you can escape the spaces with
a backslash.

So yes, it's technically possible. Query times will be poor with lots
of data though. Whether "good enough" or not is application
dependent

Best,
Erick

On Thu, Feb 2, 2017 at 7:15 AM, Maciej Ł. PCSS  wrote:
> Hi Erick, All,
>
> regardless of the value of such a use-case, there is another thing that
> stays unknown for me.
>
> Does SOLR support a simple and silly 'exact substring match'? I mean, is it
> possible to search for (actually filter by) a raw substring without
> tokenization and without any kind of processing/simplifying the searched
> information? By a 'raw substring' I mean a character string that, among
> others, can contain non-letters (colons, brackets, etc.) - basically
> everything the user is able to input via keyboard.
>
> Does this use case meet SOLR technical possibilities even if that means a
> big efficiency cost?
>
> Regards
> Maciej
>
>
> W dniu 30.01.2017 o 17:12, Erick Erickson pisze:
>>
>> Well, the usual Solr solution to leading and trailing wildcards is to
>> ngram the field. You can get the entire field (incuding spaces) to be
>> analyzed as a whole by using KeywordTokenizer. Sometimes you wind up
>> using a copyField to support this and search against one or the other
>> if necessary.
>>
>> You can do this with KeywordTokenizer and '*a bcd ef*", but that'll be
>> slow for the exact same reason the SQL query is slow: It has to
>> examine every value in every document to find terms that match then
>> search on those.
>>
>> There's some index size cost here so you'll have to test.
>>
>> Really go back to your use-case to see if this is _really_ necessary
>> though. Often people think it is because that's the only way they've
>> been able to search at all in SQL and it can turn out that there are
>> other ways to solve it. IOW, this could be an XY problem.
>>
>> Best,
>> Erick
>>
>> On Mon, Jan 30, 2017 at 1:52 AM, Maciej Ł. PCSS 
>> wrote:
>>>
>>> Hi All,
>>>
>>> What solution have you applied in your implementations?
>>>
>>> Regards
>>> Maciej
>>>
>>>
>>> W dniu 24.01.2017 o 14:10, Maciej Ł. PCSS pisze:

 Dear SOLR users,

 please point me to the right solution of my problem. I'm using SOLR to
 implement a Google-like search in my application and this scenario is
 working fine.

 However, in specific use-cases I need to filter documents that include a
 specific substring in a given field. It's about an SQL-like query
 similar to
 this:

 SELECT *  FROM table WHERE someField = '%c def g%'

 I expect to match documents having someField ='abc def ghi'. That means
 I
 expect match parts of words.

 As I understand SOLR, as a reversed-index, does work with tokens rather
 that character strings and thereby looks for whole words (not
 substrings).

 Is there any solution for such an issue?

 Regards
 Maciej Łabędzki
>>>
>>>
>


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread David Kramer
Thanks, for responding. Mikhail.  There are no deleted documents.  Since I’m 
fairly new to Solr, one of the things I’ve been paranoid about is I have no way 
of validating my schema.xml, or know whether Solr is even using it (I have 
evidence it’s not, more below). So for each test, I’ve wiped out the index, 
recreated, and reimported. 

Back to whether my schema.xml is being used, I mentioned that I had to come up 
with a compound UUID field of the first character of the docType plus the ID, 
and we put “uuid” (was id) in our schema.xml.  Then I 
deleted and recreated the index and restarted Solr.  In order to verify it was 
working, I created an import file that had unique IDs but UUIDs which were 
duplicates of existing records, and it imported the new records even though the 
UUIDs existed in the database already.  I’m not sure if Solr should have 
produced an error or not. I’ll research that, but I mention that here in case 
it’s relevant.

Thanks.

On 2/2/17, 6:10 AM, "Mikhail Khludnev"  wrote:

David,

Can you make sure your index doesn't have deleted docs? This  can be seen
in SolrAdmiun.
And can you merge index to avoid having them in the index?

On Thu, Feb 2, 2017 at 12:29 AM, David Kramer 
wrote:

>
>
> Some background:
> · The data involved is catalog data, with three nested objects:
> Products, Items, and Skus, in that order. We have a docType field on each
> record as a differentiator.
> · The "id" field in our data is unique within datatype, but not
> across datatypes. We added a "uuid" field in our program that generates 
the
> Solr import file that is the id prefixed by the first letter of the
> docType, like P12345. That makes the uuid field unique, and we have that 
as
> the uniqueKey in our schema.xml.
> · We are trying to retrieve the parent Product, and all children
> documents. As such, we are using the ChildDocTransformerFactory
> ([child...]) to retrieve the children along with the parent. We have not
> yet solved the problem of getting items within SKUs as nested documents in
> the results, and we will have to figure that out at some point, but for 
now
> we get them flattened
> · We are building out the proof of concept for this. This is all
> new work, so we are free to change a lot.
> · This is Solr 6.0.0, and we are importing in JSON format, if that
> matters
> · I submitted this question to StackOverflow stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-
> childdoctransformerfactory-get-parent-quer> but haven’t gotten any
> answers yet.
>
>
> Our data looks like this (I've removed some fields for simplicity):
>
> {
>
>   "id": 739063,
>
>   "docType": "Product",
>
>   "uuid": "P739063",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 1537378,
>
>   "price": 25.45,
>
>   "color": "Blush",
>
>   "docType": "Item",
>
>   "productId": 739063,
>
>   "uuid": "I1537378",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 12799578,
>
>   "size": "10",
>
>   "width": "W",
>
>   "docType": "Sku",
>
>   "itemId": 1537378,
>
>   "uuid": "S12799578"
>
> }
>
>   ]
>
> }
>
> }
>
>
>
> The query to fetch all Products and their children nested inside them is
> q=docType:Product=title,id,docType,[child
> parentFilter=docType:Product]. When I run that query, all is well, and it
> returns the first 10 rows. However, if I fetch more rows by adding, say
> =500, we get the error Parent query yields document which is not
> matched by parents filter, docID=XXX.
>
> When we first saw that error, we discovered our id field was not unique
> across document types, so we added the uuid field as mentioned above, 
which
> is. we also added in our schema.xml file, wiped the core, recreated it, 
and
> restarted Solr just to make sure it was in effect. We have double checked
> and are sure that the uuid fields are unique.
>
>
>
> In all the search results for that error that I've found, the OP did not
> have a field that could differentiate the different document types, but as
> you see we do. Since both the query and the parentFilter are searching for
> docType:Product I don't see how either could possibly return anything but
> parents. We've also tried adding childFilter=docType:Item and
> childFilter=docType:Sku but that did not help.  I also tried using title:*
> for the filter since only products have titles.
>
>
>
> Is there anything 

Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

2017-02-02 Thread Maciej Ł. PCSS

Hi Erick, All,

regardless of the value of such a use-case, there is another thing that 
stays unknown for me.


Does SOLR support a simple and silly 'exact substring match'? I mean, is 
it possible to search for (actually filter by) a raw substring without 
tokenization and without any kind of processing/simplifying the searched 
information? By a 'raw substring' I mean a character string that, among 
others, can contain non-letters (colons, brackets, etc.) - basically 
everything the user is able to input via keyboard.


Does this use case meet SOLR technical possibilities even if that means 
a big efficiency cost?


Regards
Maciej


W dniu 30.01.2017 o 17:12, Erick Erickson pisze:

Well, the usual Solr solution to leading and trailing wildcards is to
ngram the field. You can get the entire field (incuding spaces) to be
analyzed as a whole by using KeywordTokenizer. Sometimes you wind up
using a copyField to support this and search against one or the other
if necessary.

You can do this with KeywordTokenizer and '*a bcd ef*", but that'll be
slow for the exact same reason the SQL query is slow: It has to
examine every value in every document to find terms that match then
search on those.

There's some index size cost here so you'll have to test.

Really go back to your use-case to see if this is _really_ necessary
though. Often people think it is because that's the only way they've
been able to search at all in SQL and it can turn out that there are
other ways to solve it. IOW, this could be an XY problem.

Best,
Erick

On Mon, Jan 30, 2017 at 1:52 AM, Maciej Ł. PCSS  wrote:

Hi All,

What solution have you applied in your implementations?

Regards
Maciej


W dniu 24.01.2017 o 14:10, Maciej Ł. PCSS pisze:

Dear SOLR users,

please point me to the right solution of my problem. I'm using SOLR to
implement a Google-like search in my application and this scenario is
working fine.

However, in specific use-cases I need to filter documents that include a
specific substring in a given field. It's about an SQL-like query similar to
this:

SELECT *  FROM table WHERE someField = '%c def g%'

I expect to match documents having someField ='abc def ghi'. That means I
expect match parts of words.

As I understand SOLR, as a reversed-index, does work with tokens rather
that character strings and thereby looks for whole words (not substrings).

Is there any solution for such an issue?

Regards
Maciej Łabędzki






Re: Collection will not replicate

2017-02-02 Thread Erick Erickson
Yes. If indexing went through you'd lose docs so indexing will fail.
Querying will fail too unless you set shards.tolerant.

You really wouldn't want your docs lost is the reasoning.

On Feb 2, 2017 6:56 AM, "tedsolr"  wrote:

> Can I assume that without a leader the shard will not respond to write
> requests? I can search on the collection. If I can't update docs or add any
> new docs then this becomes an emergency.
>
>
> Erick Erickson wrote
> > It's worth a try to take down your entire cluster. Bring one machine
> > back up at a time. There _may_ be something like a 3 minute wait
> > before each of the replicas on that machine come up, the leader
> > election process has a 180 second delay before the replicas on that
> > node take over leadership to wait for the last known good leader to
> > come up.
> >
> > Continue bringing one node up at a time and wait patiently  until all
> > the replicas on it are green and until you have a leader for each
> > shard elected. Bringing up the rest of the Solr nodes should be
> > quicker then.
> >
> > Be sure to sequence things so you have known good Solr nodes come up
> > first for the shard that's wonky. By that I mean that the first node
> > you bring up for the leaderless shard should be the one with the best
> > chance of having a totally OK index.
> >
> >
> > Let's claim that the above does bring up a leader for each shard. If
> > you still have a replica that refuses to come up, use the
> > DELETEREPLICA command to remove it. Just for insurance, I'd take the
> > Solr node down after the DELETEREPLICA and remove the entire core
> > directory for the replica that didn't come up. Then restart the node
> > and use the ADDREPLICA collections API command to put it back.
> >
> > If none of that works, you could try hand-editing the state.json file
> > and _make_ one of the shards a leader (I'd do this with the Solr nodes
> > down), but that's not for the faint of heart.
> >
> > Best,
> > Erick
> >
> > On Wed, Feb 1, 2017 at 1:57 PM, Jeff Wartes 
>
> > jwartes@
>
> >  wrote:
> >> Sounds similar to a thread last year:
> >> http://lucene.472066.n3.nabble.com/Node-not-
> recovering-leader-elections-not-occuring-tp4287819p4287866.html
> >>
> >>
> >>
> >> On 2/1/17, 7:49 AM, "tedsolr" 
>
> > tsmith@
>
> >  wrote:
> >>
> >> I have version 5.2.1. Short of an upgrade, are there any remedies?
> >>
> >>
> >> Erick Erickson wrote
> >> > What version of Solr? since 5.4 there's been a FORCELEADER
> >> colelctions
> >> > API call that might help.
> >> >
> >> > I'd run it with the newly added replicas offline. you only want it
> >> to
> >> > have good replicas to choose from.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Wed, Feb 1, 2017 at 6:48 AM, tedsolr 
> >>
> >> > tsmith@
> >>
> >> >  wrote:
> >> >> Update! I did find an error:
> >> >>
> >> >> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.
> SolrException
> >> >> :org.apache.solr.common.SolrException: Error getting leader from
> >> zk for
> >> >> shard shard1
> >> >> 
> >> >> Caused by: org.apache.solr.common.SolrException: Could not get
> >> leader
> >> >> props
> >> >> at
> >> >>
> >> org.apache.solr.cloud.ZkController.getLeaderProps(
> ZkController.java:1040)
> >> >> at
> >> >>
> >> org.apache.solr.cloud.ZkController.getLeaderProps(
> ZkController.java:1004)
> >> >> at
> >> >>
> >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
> >> >> ... 14 more
> >> >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> >> >> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
> >> >> at
> >> >>
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> >> >>
> >> >> When I view the cluster status I see that this shard does not
> have
> >> a
> >> >> leader.
> >> >> So it appears I need to force the leader designation to the
> >> "active"
> >> >> replica. How do I do that?
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-
> tp4318260p4318265.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-
> tp4318260p4318283.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Collection-will-not-replicate-tp4318260p4318479.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Collection will not replicate

2017-02-02 Thread tedsolr
Can I assume that without a leader the shard will not respond to write
requests? I can search on the collection. If I can't update docs or add any
new docs then this becomes an emergency.


Erick Erickson wrote
> It's worth a try to take down your entire cluster. Bring one machine
> back up at a time. There _may_ be something like a 3 minute wait
> before each of the replicas on that machine come up, the leader
> election process has a 180 second delay before the replicas on that
> node take over leadership to wait for the last known good leader to
> come up.
> 
> Continue bringing one node up at a time and wait patiently  until all
> the replicas on it are green and until you have a leader for each
> shard elected. Bringing up the rest of the Solr nodes should be
> quicker then.
> 
> Be sure to sequence things so you have known good Solr nodes come up
> first for the shard that's wonky. By that I mean that the first node
> you bring up for the leaderless shard should be the one with the best
> chance of having a totally OK index.
> 
> 
> Let's claim that the above does bring up a leader for each shard. If
> you still have a replica that refuses to come up, use the
> DELETEREPLICA command to remove it. Just for insurance, I'd take the
> Solr node down after the DELETEREPLICA and remove the entire core
> directory for the replica that didn't come up. Then restart the node
> and use the ADDREPLICA collections API command to put it back.
> 
> If none of that works, you could try hand-editing the state.json file
> and _make_ one of the shards a leader (I'd do this with the Solr nodes
> down), but that's not for the faint of heart.
> 
> Best,
> Erick
> 
> On Wed, Feb 1, 2017 at 1:57 PM, Jeff Wartes 

> jwartes@

>  wrote:
>> Sounds similar to a thread last year:
>> http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html
>>
>>
>>
>> On 2/1/17, 7:49 AM, "tedsolr" 

> tsmith@

>  wrote:
>>
>> I have version 5.2.1. Short of an upgrade, are there any remedies?
>>
>>
>> Erick Erickson wrote
>> > What version of Solr? since 5.4 there's been a FORCELEADER
>> colelctions
>> > API call that might help.
>> >
>> > I'd run it with the newly added replicas offline. you only want it
>> to
>> > have good replicas to choose from.
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Feb 1, 2017 at 6:48 AM, tedsolr 
>>
>> > tsmith@
>>
>> >  wrote:
>> >> Update! I did find an error:
>> >>
>> >> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
>> >> :org.apache.solr.common.SolrException: Error getting leader from
>> zk for
>> >> shard shard1
>> >> 
>> >> Caused by: org.apache.solr.common.SolrException: Could not get
>> leader
>> >> props
>> >> at
>> >>
>> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
>> >> at
>> >>
>> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
>> >> at
>> >>
>> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
>> >> ... 14 more
>> >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> >> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
>> >> at
>> >>
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>> >>
>> >> When I view the cluster status I see that this shard does not have
>> a
>> >> leader.
>> >> So it appears I need to force the leader designation to the
>> "active"
>> >> replica. How do I do that?
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318265.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318283.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318479.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 6 Facet range query over streaming API

2017-02-02 Thread yauza
While migrating from Solr legacy faceting to Solr streaming API, how to
implement the below parameters:

 = true
 = {!key=age_ranges}age
 = 0
 = 100
 = 10
 = {!key=price_ranges}price
 = 0
 = 1000
 = 50
 = a_field
 = 1


How can I implement these parameters using the new facet streaming with
parallel functions. I understand rollup() functionality has to be used.But
the documentation is limited and I am not able to understand the underlying
functionality. Can someone help me with this?

Ref Sample Query being currently used:

q=(score:[*+TO+*])+AND+(field:378/content)+AND+(crawl_date:[2015-01-01T18:30:00.000Z+TO+2017-02-01T18:30:00.000Z])
=score=project_score=publish_date=0
=1000=10=true_score.facet.range.start=0.0
_score.facet.range.end=1.0_score.facet.range.gap=0.01_date.facet.range.start=1970-01-01T00:00:00Z
_date.facet.range.end=2017-02-01T14:09:26.763Z_date.facet.range.gap=%2B1MONTH=0=project_score
={!key%3D378+facet.prefix%3D378/content}field=Tags=patent_publication
=score=1000=1




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Facet-range-query-over-streaming-API-tp4318477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Shawn Heisey
On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> The SynonymGraphFilter API documentation contains the following statement
> at the end:
>
> "To get fully correct positional queries when your synonym replacements are
> multiple tokens, you should instead apply synonyms using this TokenFilter
> at query time and translate the resulting graph to a TermAutomatonQuery
> e.g. using TokenStreamToTermAutomatonQuery."

Lucene is a programming API for search.  That documentation is intended
for people who are writing Lucene programs.  Those users would be
constructing query objects in their own code, so they would most likely
know exactly which object needs to be changed to TermAutomatonQuery.

Solr is a Lucene program ... and an immensely complicated one.  Many
Lucene improvements require changes in the end program for full
support.  I suspect that Solr's capability has not been updated to use
this new feature in Lucene.  I cannot say for sure, I hope someone who
is familiar with this Lucene change and Solr internals can comment.

Thanks,
Shawn



Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Shawn Heisey
On 2/2/2017 7:23 AM, Ravi Solr wrote:
> When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
>
> Could not load codec 'Lucene62'.  Did you forget to add
> lucene-backward-codecs.jar?
> at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
>
> Hope this doesnt cost me dearly. Any ideas at least on how to rollback
> safely.

This sounds like you did some indexing after the upgrade, or possibly
some index optimizing, so the parts of the index that were written (or
merged) by the newer version are now in a format that the older version
cannot use.  Perhaps the merge policy was changed, causing Solr to do
some automatic merges once it started up.  I am not aware of anything in
Solr that would write new segments without indexing input or a merge
policy change.

As far as I know, there is no straightforward way to go backwards with
the index format.  If you want to downgrade and don't have a backup of
your indexes from before the upgrade, you'll probably need to wipe the
index directory and completely reindex.

Solr will always use the newest default index format for new segments
when you upgrade.  Contrary to many user expectations, setting
luceneMatchVersion will *NOT* affect the index format, only the behavior
of components that do field analysis.

Downgrading the index format would involve writing a custom Lucene
program that changes the active index format to the older version, then
runs a forceMerge on the index.  It would be completely separate from
Solr, and definitely not straightforward.

Thanks,
Shawn



Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Cliff Dickinson
I've been eagerly awaiting the release of the new SynonymGraphFilter in
Solr 6.4.  We have the need to support multi-word synonyms, which were
always problematic with the old SynonymFilterFactory.  I've upgraded to
Solr 6.4 and replaced the old filter with the new one, but am not seeing
the results that I had hoped for yet.  I suspect my configuration is
lacking something important.

I'm starting with the simple example in the SynonymGraphFilterFactory API
doucmentation:








And example entry in the synonyms.txt file is:

booster, representative of athletics interest

My problem with the old filter has always been that if I run a query for
"booster", I get results containing any of the following words: booster,
representative, athletics, interest.  This is way more results than I
want.  A document that only contains athletics, but none of the other words
in the synonym is returned.  What I really want are documents that contain
"booster" or the full synonym phrase of "representative of athletics
interest".  How could I accomplish this?

The SynonymGraphFilter API documentation contains the following statement
at the end:

"To get fully correct positional queries when your synonym replacements are
multiple tokens, you should instead apply synonyms using this TokenFilter
at query time and translate the resulting graph to a TermAutomatonQuery
e.g. using TokenStreamToTermAutomatonQuery."

How do I use TokenStreamtoTermAutomationQuery or can this not be configured
in Solr, but only by writing code against Lucene?  Would this even address
my issue?

I've found synonyms to be very frustrating in Solr and am hoping this new
filter will be a big improvement.  Thanks in advance for the help!


Re: solr query help

2017-02-02 Thread Shawn Heisey
On 2/2/2017 6:16 AM, deepak.gha...@mediawide.com wrote:
> I am writting query for getting response from specific index content first.
> eg.
> http://192.168.200.14:8983/solr/mypgmee/select?q=*blood*=id:(*/939/* OR 
> **)=id=json=true
>  
> In above query I am getting response, Means suppose I Get 4 result for course 
> "939" out of 10. It works fine by releanvence. 
> But Now I want "939" course result first then other result.So Please let me 
> how can I do that.

Are those asterisks in your actual query, or did you add some of them
for emphasis?  I'm guessing that at least some of them are added for
emphasis.  Asterisks in a query are wildcard characters, and wildcard
queries frequently do not behave like people expect them to.

We need to see the actual query, without any extra characters.  If
asterisks have been added, then we will have an incorrect understanding
about what the query says.

Attempting to guess what you are actually doing, here is what I THINK
you might have meant:

q=blood
fq=id:(939 OR *)

If I'm right about that filter query, I do not know what you were trying
to do.  If you were trying to query for 939 or any other value, use this
instead:

fq=id:[* TO *]

This is a range query that will match all documents where the id field
contains SOMETHING.  Documents where it is not present will not be
matched.  If you do not actually want to *filter* on the id value, then
you can simply remove the fq parameter entirely.  Filter queries do not
contribute to scoring at all -- they *only* filter.

The last part of your message says you want documents with an id of 939
to be listed first.  I think you can do that with this:

bq=id:939^10

The bq parameter means "boost query".  The ^10 part of what I've written
is the boost value.

Thanks,
Shawn



Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to
moving to 6.4.0.

On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp 
wrote:

> Might be that your overseer queue overloaded. Similar to what is described
> here:
> https://support.lucidworks.com/hc/en-us/articles/203959903-
> Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up
>
> If the overseer queue gets too long you get hit by this:
> https://github.com/Netflix/curator/wiki/Tech-Note-4
>
> Try to request the overseer status 
> (/solr/admin/collections?action=OVERSEERSTATUS).
> If that fails you likely hit that problem. If so you can also not use the
> ZooKeeper command line client anymore. You can now restart all your ZK
> nodes with an increases jute.maxbuffer value. Once ZK is restarted you can
> use the ZK command line client with the same jute.maxbuffer value and check
> how many entries /overseer/queue has in ZK. Normally there should be a few
> entries but if you see thousands then you should delete them. I used a few
> lines of Java code for that, again setting jute.maxbuffer to the same
> value. Once cleaned up restart the Solr nodes one by one and keep an eye on
> the overseer status.
>
>
> On 02.02.2017 10:52, Ravi Solr wrote:
>
>> Following up on my previous email, the intermittent server unavailability
>> seems to be linked to the interaction between Solr and Zookeeper. Can
>> somebody help me understand what this error means and how to recover from
>> it.
>>
>> 2017-02-02 09:44:24.648 ERROR
>> (recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
>> x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
>> [c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
>> o.a.s.c.RecoveryStrategy Error while trying to recover.
>> core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperE
>> xception$SessionExpiredException:
>> KeeperErrorCode = Session expired for /overseer/queue/qn-
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:127)
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:51)
>>  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:391)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:388)
>>  at
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
>> CmdExecutor.java:60)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
>>  at
>> org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1215)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1128)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1124)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt
>> rategy.java:334)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.
>> java:222)
>>  at
>> com.codahale.metrics.InstrumentedExecutorService$Instrumente
>> dRunnable.run(InstrumentedExecutorService.java:176)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:
>>
>> Hello,
>>>   Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>>> hours of debugging spree!! Can somebody kindly help me  out of this
>>> misery.
>>>
>>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>>> updated the configs and started the servers one of my collection got
>>> stuck
>>> with no leader. I have restarted solr to no avail, I also tried to force
>>> a
>>> leader via collections API that dint work either. I also see that, from
>>> time to time multiple solr nodes go down all at the same time, only a
>>> restart resolves the issue.
>>>
>>> The error snippets are shown below
>>>
>>> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
>>> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
>>> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
>>> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
>>> to recover. core=clicktrack_shard1_replica1:org.apache.solr.common.
>>> SolrException:
>>> No registered leader was found after waiting for 4000ms , collection:
>>> clicktrack slice: shard1
>>>
>>> 

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1

Could not load codec 'Lucene62'.  Did you forget to add
lucene-backward-codecs.jar?
at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)

Hope this doesnt cost me dearly. Any ideas at least on how to rollback
safely.

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 4:52 AM, Ravi Solr  wrote:

> Following up on my previous email, the intermittent server unavailability
> seems to be linked to the interaction between Solr and Zookeeper. Can
> somebody help me understand what this error means and how to recover from
> it.
>
> 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-
> processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4
> s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3
> x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying
> to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper.
> KeeperException$SessionExpiredException: KeeperErrorCode = Session
> expired for /overseer/queue/qn-
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:391)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:388)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
> at org.apache.solr.common.cloud.SolrZkClient.create(
> SolrZkClient.java:388)
> at org.apache.solr.cloud.DistributedQueue.offer(
> DistributedQueue.java:244)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:334)
> at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:222)
> at com.codahale.metrics.InstrumentedExecutorService$
> InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:
>
>> Hello,
>>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>>
>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>> updated the configs and started the servers one of my collection got stuck
>> with no leader. I have restarted solr to no avail, I also tried to force a
>> leader via collections API that dint work either. I also see that, from
>> time to time multiple solr nodes go down all at the same time, only a
>> restart resolves the issue.
>>
>> The error snippets are shown below
>>
>> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
>> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
>> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
>> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
>> to recover. 
>> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
>> No registered leader was found after waiting for 4000ms , collection:
>> clicktrack slice: shard1
>>
>> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:43.767 INFO  

solr query help

2017-02-02 Thread deepak . ghayal


Hello Sir,
I am writting query for getting response from specific index content first.
eg.
http://192.168.200.14:8983/solr/mypgmee/select?q=*blood*=id:(*/939/* OR 
**)=id=json=true
 
In above query I am getting response, Means suppose I Get 4 result for course 
"939" out of 10. It works fine by releanvence. 
But Now I want "939" course result first then other result.So Please let me how 
can I do that.
 
 
Thanks & Regards,
Deepak G.
 

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Hendrik Haddorp
Might be that your overseer queue overloaded. Similar to what is 
described here:

https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up

If the overseer queue gets too long you get hit by this:
https://github.com/Netflix/curator/wiki/Tech-Note-4

Try to request the overseer status 
(/solr/admin/collections?action=OVERSEERSTATUS). If that fails you 
likely hit that problem. If so you can also not use the ZooKeeper 
command line client anymore. You can now restart all your ZK nodes with 
an increases jute.maxbuffer value. Once ZK is restarted you can use the 
ZK command line client with the same jute.maxbuffer value and check how 
many entries /overseer/queue has in ZK. Normally there should be a few 
entries but if you see thousands then you should delete them. I used a 
few lines of Java code for that, again setting jute.maxbuffer to the 
same value. Once cleaned up restart the Solr nodes one by one and keep 
an eye on the overseer status.


On 02.02.2017 10:52, Ravi Solr wrote:

Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.

2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
[c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391)
 at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388)
 at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 at
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
 at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
 at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
 at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:


Hello,
  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
to recover. 
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
No registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-
processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
cluster state change: [WatchedEvent state:SyncConnected
type:NodeDataChanged path:/collections/clicktrack/state.json] for
collection [clicktrack] has occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-
processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
cluster state change: [WatchedEvent 

Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-02 Thread Mikhail Khludnev
David,

Can you make sure your index doesn't have deleted docs? This  can be seen
in SolrAdmiun.
And can you merge index to avoid having them in the index?

On Thu, Feb 2, 2017 at 12:29 AM, David Kramer 
wrote:

>
>
> Some background:
> · The data involved is catalog data, with three nested objects:
> Products, Items, and Skus, in that order. We have a docType field on each
> record as a differentiator.
> · The "id" field in our data is unique within datatype, but not
> across datatypes. We added a "uuid" field in our program that generates the
> Solr import file that is the id prefixed by the first letter of the
> docType, like P12345. That makes the uuid field unique, and we have that as
> the uniqueKey in our schema.xml.
> · We are trying to retrieve the parent Product, and all children
> documents. As such, we are using the ChildDocTransformerFactory
> ([child...]) to retrieve the children along with the parent. We have not
> yet solved the problem of getting items within SKUs as nested documents in
> the results, and we will have to figure that out at some point, but for now
> we get them flattened
> · We are building out the proof of concept for this. This is all
> new work, so we are free to change a lot.
> · This is Solr 6.0.0, and we are importing in JSON format, if that
> matters
> · I submitted this question to StackOverflow stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-
> childdoctransformerfactory-get-parent-quer> but haven’t gotten any
> answers yet.
>
>
> Our data looks like this (I've removed some fields for simplicity):
>
> {
>
>   "id": 739063,
>
>   "docType": "Product",
>
>   "uuid": "P739063",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 1537378,
>
>   "price": 25.45,
>
>   "color": "Blush",
>
>   "docType": "Item",
>
>   "productId": 739063,
>
>   "uuid": "I1537378",
>
>   "_childDocuments_": [
>
> {
>
>   "id": 12799578,
>
>   "size": "10",
>
>   "width": "W",
>
>   "docType": "Sku",
>
>   "itemId": 1537378,
>
>   "uuid": "S12799578"
>
> }
>
>   ]
>
> }
>
> }
>
>
>
> The query to fetch all Products and their children nested inside them is
> q=docType:Product=title,id,docType,[child
> parentFilter=docType:Product]. When I run that query, all is well, and it
> returns the first 10 rows. However, if I fetch more rows by adding, say
> =500, we get the error Parent query yields document which is not
> matched by parents filter, docID=XXX.
>
> When we first saw that error, we discovered our id field was not unique
> across document types, so we added the uuid field as mentioned above, which
> is. we also added in our schema.xml file, wiped the core, recreated it, and
> restarted Solr just to make sure it was in effect. We have double checked
> and are sure that the uuid fields are unique.
>
>
>
> In all the search results for that error that I've found, the OP did not
> have a field that could differentiate the different document types, but as
> you see we do. Since both the query and the parentFilter are searching for
> docType:Product I don't see how either could possibly return anything but
> parents. We've also tried adding childFilter=docType:Item and
> childFilter=docType:Sku but that did not help.  I also tried using title:*
> for the filter since only products have titles.
>
>
>
> Is there anything else we can try?
>
> Any explanation of this?
>
> Is it possible that it's not using uuid as the unique identifier even
> though it's specified in the schema.xml, and would that even cause this?
>
> Thanks.
>
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-02-02 Thread Koji Sekiguchi

Hi,

NLP4L[1] has not only Learning-to-Rank module but also a module which calculates
click model and converts it into pointwise annotation data.

NLP4L has a comprehensive manual[2], but you may want to read "Click Log 
Analysis"
section[3] first to see if it suits your requirements.

Hope this helps. Thanks!

Koji
--
T: @kojisays

[1] https://github.com/NLP4L/nlp4l
[2] https://github.com/NLP4L/manuals
[3] https://github.com/NLP4L/manuals/blob/master/ltr/ltr_import.md

On 2017/01/05 17:02, Jeffery Yuan wrote:

Thanks very much for integrating machine learning to Solr.
https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md

In the Assemble training data part: the third column indicates the relative
importance or relevance of that doc
Could you please give more info about how to give a score based on what user
clicks?

I have read
https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html

But still have no clue how to translate the partial pairwise feedback to the
importance or relevance of that doc.


From a user's perspective, the steps such as setup the feature and model in

Solr is simple, but collecting the feedback data and train/update the model
is much more complex.

It would be great Solr can provide some detailed instruction or sample code
about how to translate the partial pairwise feedback and use it to train and
update model.

Thanks again for your help.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-train-the-model-using-user-clicks-when-use-ltr-learning-to-rank-module-tp4312462.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.

2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
[c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:

> Hello,
>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>
> I have a set has 8 single shard collections with 3 replicas. As soon as I
> updated the configs and started the servers one of my collection got stuck
> with no leader. I have restarted solr to no avail, I also tried to force a
> leader via collections API that dint work either. I also see that, from
> time to time multiple solr nodes go down all at the same time, only a
> restart resolves the issue.
>
> The error snippets are shown below
>
> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
> to recover. 
> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
> No registered leader was found after waiting for 4000ms , collection:
> clicktrack slice: shard1
>
> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:43.767 INFO  (zkCallback-4-thread-23-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
>
>
> Suspecting the worst I backed up the index and renamed the collection's
> data folder and restarted the servers, this time the collection got a
> proper leader. So is my index really corrupted ? Solr UI showed live nodes
> just like the logs but without any leader. Even with the leader issue
> somewhat alleviated after renaming the data folder and letting silr create
> a new data folder my servers did go down a couple of times.
>
> I am not all that well versed with zookeeper...any trick to make zookeeper
> pick 

Re: How long for autoAddReplica?

2017-02-02 Thread Hendrik Haddorp

Hi,

are you using HDFS? According to the documentation the feature should be 
only available if you are using HDFS. For me it did however also fail on 
that. See the thread "Solr on HDFS: AutoAddReplica does not add a 
replica" from about two weeks ago.


regards,
Hendrik

On 02.02.2017 07:21, Walter Underwood wrote:

I added a new node an shut down a node with a shard replica on it. It has been 
an hour and I don’t see any activity toward making a new replica.

The new node and the one I shut down are both 6.4. The rest of the 16-node 
cluster is 6.2.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)