Re: Can I express this nested query in JSON DSL?

2020-12-08 Thread Mikhail Khludnev
Hi, Mikhail.
Shouldn't be a big deal
"bool":{
 "must":[ "x",
{"bool":
 {"should":["y","z"]}}]
}

On Tue, Dec 8, 2020 at 6:13 AM Mikhail Edoshin 
wrote:

> Hi,
>
> I'm more or less new to Solr. I need to run queries that use joins all
> over the place. (The idea is to index database records pretty much as
> they are and then query them in interesting ways and, most importantly,
> get the rank. Our dataset is not too large so the performance is great.)
>
> I managed to express the logic using the following approach. For
> example, I want to search people by their names or addresses:
>
>q=type:Person^=0 AND ({!edismax qf= v=$p0} OR {!join
>  v=$p1})
>p1={!edismax qf= v=p0}
>p0=
>
> (Here 'type:Person' works as a filter so I zero its score.) This seems
> to work as expected and give the right results and ranking. It also
> seems to scale nicely for two levels of joins, although the queries
> become rather hard to follow in their raw form (I used a custom
> XML-to-query transformer to actually formulate more complex queries).
>
> So my question is that: can I express an equivalent query using the
> query DSL? I know I can use 'bool' like that:
>
> {
>"query": {
>   "bool" : {
>  "must" : [ ... ];
>  "should" : [ ... ]
>}
> }
>   }
>
> But how do I actually go from 'x AND (y OR z)' to 'bool' in the query
> DSL? I seem to lose the nice compositional properties of the expression.
> Here, for example, the expression implies that at least 'y' or 'z' must
> match; I don't quite see how I can express this in the DSL.
>
> Kind regards,
> Mikhail
>


-- 
Sincerely yours
Mikhail Khludnev


Re: QTime lesser for facet.limit=-1 than facet.limit=5000/10000

2020-07-09 Thread Mikhail Khludnev
Hi,
Usually, limit=-1 works as a single pass-through and counts accumulating;
but when limit >0 causes collecting per value docset, whic might take
longer. There's a note about this effect in uniqueBlock() description.

On Wed, Jul 8, 2020 at 11:29 AM ana  wrote:

> Hi Team,
> Which is more optimized: facet.limit=-1 OR facet.limit=1/5/4?
> For a high Cardinality string field, with no cache enabled, no docValues
> enabled, after every RELOAD on Solr admin UI for each query with different
> facet.limit, why the QTime for "facet.limit=-1" is lesser as compared to
> that of a 'facet.limit=5000/1". What factors apart from those listed
> above matters in calculating QTime?
>
> My understanding is that facet.limit=-1 should have higher response Time as
> per Solr ref guide as compared to any other higher facet.limit specified.
>
> Experiment :
>
> field = abc_s
> cardinality:71520
> num of docs : count:52055449,
> total num of facets:70657
> appliedMethod: FC
> Test query :
> http://localhost:8983/solr/
> /select?facet.field=abc_s=on=*:*=0=true_s.facet.limit=-1
>
> facet.limit -1  100  5000   1 4
> 5
> QTime   983857   34295324  1006
>  1027
>
> Debug response for facet.limit=1 is attached
> facet_Response_1.txt
> <https://lucene.472066.n3.nabble.com/file/t495711/facet_Response_1.txt>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-07 Thread Mikhail Khludnev
Still not clear regarding fl param. Does request enabled timeAllowed param?
Anyway debugQuery true should give a clue why  "sort_values"  are absent in
shard response, note they should be supplied at
QueryComponent.doFieldSortValues(ResponseBuilder, SolrIndexSearcher).

On Tue, Jul 7, 2020 at 4:19 PM Jae Joo  wrote:

> 8.3.1
>
>   required="true" multiValued="false" docValues="true"/>
>   required="true" multiValued="false"/>
>
> the field "id" is for nested document.
>
>
>
>
> On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev  wrote:
>
> > Hi,
> > What's the version? What's uniqueKey? is it stored? what's fl param?
> >
> > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:
> >
> > > I am seeing the nullPointerException in the list below and I am
> > > looking for how to fix the exception.
> > >
> > > Thanks,
> > >
> > >
> > > NamedList sortFieldValues =
> > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> > > if (sortFieldValues.size()==0 && // we bypass merging this response
> > > only if it's partial itself
> > > thisResponseIsPartial) { // but not the previous
> > one!!
> > >   continue; //fsv timeout yields empty sort_vlaues
> > > }
> > >
> > >
> > >
> > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> > > at
> > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> > > at
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > > at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > > at
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-06 Thread Mikhail Khludnev
Hi,
What's the version? What's uniqueKey? is it stored? what's fl param?

On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:

> I am seeing the nullPointerException in the list below and I am
> looking for how to fix the exception.
>
> Thanks,
>
>
> NamedList sortFieldValues =
> (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> if (sortFieldValues.size()==0 && // we bypass merging this response
> only if it's partial itself
> thisResponseIsPartial) { // but not the previous one!!
>   continue; //fsv timeout yields empty sort_vlaues
> }
>
>
>
> 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> at
>
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>


-- 
Sincerely yours
Mikhail Khludnev


Re: FunctionScoreQuery how to use it

2020-07-01 Thread Mikhail Khludnev
Hi, Vincenzo.

Discussed earlier
https://www.mail-archive.com/java-user@lucene.apache.org/msg50255.html

On Wed, Jul 1, 2020 at 8:36 PM Vincenzo D'Amore  wrote:

> Hi all,
>
> I'm struggling with an old class that extends CustomScoreQuery.
> I was trying to port to solr 8.5.2 and I'm looking for an example on how to
> implement it using FunctionScoreQuery.
>
> Do you know if there are examples that explain how to port the code to the
> new implementation?
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: About timeAllowed when using LTR

2020-06-30 Thread Mikhail Khludnev
Hi, Dawn.

It might make sense. Feel free to raise a jira, and "patches are welcome!".


On Tue, Jun 30, 2020 at 10:33 AM Dawn  wrote:

> Hi:
>
> When using the LTR, open timeAllowed parameter, LTR feature of query may
> call ExitableFilterAtomicReader. CheckAndThrow timeout detection.
>
> If a timeout occurs at this point, the exception ExitingReaderException is
> thrown, resulting in a no-result return.
>
> Is it possible to accommodate this exception in LTR so that any result
> that THE LTR has cleared will be returned instead of empty.
>
> This exception occurs in two places:
>
> 1. LTRScoringQuery. CreateWeight or createWeightsParallel. Here is the
> loading stage, timeout directly end is acceptable.
>
> 2. ModelWeight.scorer. This is a stage that evaluates each Doc and can
> catch the exception, end early, and return part of the result.



-- 
Sincerely yours
Mikhail Khludnev


Re: Prefix + Suffix Wildcards in Searches

2020-06-29 Thread Mikhail Khludnev
Hello, Chris.
I suppose index time analysis can yield these terms:
"paid","ms-reply-unpaid","ms-reply-paid", and thus let you avoid these
expensive wildcard queries. Here's why it's worth to avoid them
https://www.slideshare.net/lucidworks/search-like-sql-mikhail-khludnev-epam

On Mon, Jun 29, 2020 at 6:17 PM Chris Dempsey  wrote:

> Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) but
> I'm looking into options for optimizing something like this:
>
> > fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
> tag:*ms-reply-paid*
>
> It's probably not a surprise that we're seeing performance issues with
> something like this. My understanding is that using the wildcard on both
> ends forces a full-text index search. Something like the above can't take
> advantage of something like the ReverseWordFilter either. I believe
> constructing `n-grams` is an option (*at the expense of index size*) but is
> there anything I'm overlooking as a possible avenue to look into?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Developing update processor/Query Parser

2020-06-25 Thread Mikhail Khludnev
Hello, Vincenzo.
Please find above about a dedicated component doing nothing, but just
holding a config.
Also you may extract config into a file and load it by
SolrResourceLoaderAware.

On Thu, Jun 25, 2020 at 2:06 PM Vincenzo D'Amore  wrote:

> Hi Mikhail, yup, I was trying to avoid putting logic in Solr.
> Just to be a little bit more specific, consider that if the update factory
> writes a field that has a size of 50.
> The QParser should be aware of the current size when writing a query.
>
> Is it possible to have in solrconfig.xml file a shared configuration?
>
> I mean a snippet of configuration shared between update processor factory
> and QParser.
>
>
> On Wed, Jun 24, 2020 at 10:33 PM Mikhail Khludnev  wrote:
>
> > Hello, Vincenzo.
> > Presumably you can introduce a component which just holds a config data,
> > and then this component might be lookedup from QParser and UpdateFactory.
> > Overall, it seems like embedding logic into Solr core, which rarely works
> > well.
> >
> > On Wed, Jun 24, 2020 at 8:00 PM Vincenzo D'Amore 
> > wrote:
> >
> > > Hi all,
> > >
> > > I've started to work on a couple of components very tight together.
> > > An update processor that writes few fields in the solr index and a
> Query
> > > Parser that, well, then reads such fields from the index.
> > >
> > > Such components share few configuration parameters together, I'm asking
> > if
> > > there is a pattern, a draft, a sample, some guidelines or best
> practices
> > > that explains how to properly save configuration parameters.
> > >
> > > The configuration is written into the solrconfig.xml file, for example:
> > >
> > >
> > >  
> > >x1
> > >x2
> > >  
> > >
> > >
> > > And then query parser :
> > >
> > >  > > class="com.example.query.MyCustomQueryParserPlugin" />
> > >
> > > I'm struggling because the change of configuration on the updated
> > processor
> > > has an impact on the query parser.
> > > For example the configuration info shared between those two components
> > can
> > > be overwritten during a core reload.
> > > Basically, during an update or a core reload, there is a query parser
> > that
> > > is serving requests while some other component is updating the index.
> > > So I suppose there should be a pattern, an approach, a common solution
> > when
> > > a piece of configuration has to be loaded at boot, or when the core is
> > > loaded.
> > > Or when, after an update a new searcher is created and a new query
> parser
> > > is created.
> > >
> > > Any suggestion is really appreciated.
> > >
> > > Best regards,
> > > Vincenzo
> > >
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Unexpected results using Block Join Parent Query Parser

2020-06-25 Thread Mikhail Khludnev
Ok. My fault. Old sport, you know. When retrieving  intermediate scopes,
parents bitmask shout include all enclosing scopes as well. It's a dark
side of the BJQ.
 {!parent which=`class:(section OR composition)`}
I'm not sure what you try to achieve specifying grandchildren as a
parent-bitmask. Note, the algorithm assumes that parents' bitmask has the
last doc in the segment set. I.e. 'which' query supplied in runtime should
strictly correspond to the block structure indexed before.

On Thu, Jun 25, 2020 at 12:05 PM Tor-Magne Stien Hagen  wrote:

> If I modify the query like this:
>
> {!parent which='class:instruction'}class:observation
>
> It still returns a result for the instruction document, even though the
> document with class instruction does not have any children...
>
> Tor-Magne Stien Hagen
>
> -Original Message-
> From: Mikhail Khludnev 
> Sent: Wednesday, June 24, 2020 2:14 PM
> To: solr-user 
> Subject: Re: Unexpected results using Block Join Parent Query Parser
>
> Jan, thanks for the clarification.
> Sure you can use {!parent which=class:section} for return children, which
> has a garndchildren matching subordinate query.
> Note: there's something about named scopes, which I didn't get into yet,
> but it might be relevant to the problem.
>
> On Wed, Jun 24, 2020 at 1:43 PM Jan Høydahl  wrote:
>
> > I guess the key question here is whether «parent» in BlockJoin is
> > strictly top-level parent/root, i.e. class:composition for the example
> > in this tread? Or can {!parent} parser also be used to select the
> > «child» level in a child/grandchild relationship inside a block?
> >
> > Jan
> >
> > > 24. jun. 2020 kl. 11:36 skrev Tor-Magne Stien Hagen :
> > >
> > > Thanks for your answer,
> > >
> > > What kind of rules exists for the which clause? In other words, how
> > > can
> > you identify parents without using some sort of filtering?
> > >
> > > Tor-Magne Stien Hagen
> > >
> > > -Original Message-
> > > From: Mikhail Khludnev 
> > > Sent: Wednesday, June 24, 2020 10:01 AM
> > > To: solr-user 
> > > Subject: Re: Unexpected results using Block Join Parent Query Parser
> > >
> > > Hello,
> > >
> > > Please check warning box titled Using which
> > >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fluce
> > ne.apache.org%2Fsolr%2Fguide%2F8_5%2Fother-parsers.html%23block-join-p
> > arent-query-parserdata=02%7C01%7Ctsh%40dips.no%7C9201c7db5ed34baf
> > 864808d818383e50%7C2f46c9197c11446584b2e354fb809979%7C0%7C0%7C63728597
> > 7131631165sdata=0kMYuLmBcziHdzOucKA7Vx63Xr7a90dqOsplNteRbvE%3D
> > p;reserved=0
> > >
> > > On Wed, Jun 24, 2020 at 10:01 AM Tor-Magne Stien Hagen 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> I have indexed the following nested document in Solr:
> > >>
> > >> {
> > >>"id": "1",
> > >>"class": "composition",
> > >>"children": [
> > >>{
> > >>"id": "2",
> > >>"class": "section",
> > >>"children": [
> > >>{
> > >>"id": "3",
> > >>"class": "observation"
> > >>}
> > >>]
> > >>},
> > >>{
> > >>"id": "4",
> > >>"class": "section",
> > >>"children": [
> > >>{
> > >>"id": "5",
> > >>"class": "instruction"
> > >>    }
> > >>        ]
> > >>}
> > >>]
> > >> }
> > >>
> > >> Given the following query:
> > >>
> > >> {!parent which='id:4'}id:3
> > >>
> > >> I expect the result to be empty as document 3 is not a child
> > >> document of document 4.
> > >>
> > >> To reproduce, use the docker container provided here:
> > >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> > >> ith
> > >> ub.com%2Ftormsh%2FSolr-Exampledata=02%7C01%7Ctsh%40dips.no%7C5
> > >> fef
> > >> 4e9a68cc41c72fd208d81814e93e%7C2f46c9197c11446584b2e354fb809979%7C0
> > >> %7C
> > >> 0%7C637285825378470570sdata=OyjBalFeXfb0W2euL76L%2BNyRDg9ukvT8
> > >> TNI
> > >> aODCmV30%3Dreserved=0
> > >>
> > >> Have I misunderstood something regarding the Block Join Parent
> > >> Query Parser?
> > >>
> > >> Tor-Magne Stien Hagen
> > >>
> > >>
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Developing update processor/Query Parser

2020-06-24 Thread Mikhail Khludnev
Hello, Vincenzo.
Presumably you can introduce a component which just holds a config data,
and then this component might be lookedup from QParser and UpdateFactory.
Overall, it seems like embedding logic into Solr core, which rarely works
well.

On Wed, Jun 24, 2020 at 8:00 PM Vincenzo D'Amore  wrote:

> Hi all,
>
> I've started to work on a couple of components very tight together.
> An update processor that writes few fields in the solr index and a Query
> Parser that, well, then reads such fields from the index.
>
> Such components share few configuration parameters together, I'm asking if
> there is a pattern, a draft, a sample, some guidelines or best practices
> that explains how to properly save configuration parameters.
>
> The configuration is written into the solrconfig.xml file, for example:
>
>
>  
>x1
>x2
>  
>
>
> And then query parser :
>
>  class="com.example.query.MyCustomQueryParserPlugin" />
>
> I'm struggling because the change of configuration on the updated processor
> has an impact on the query parser.
> For example the configuration info shared between those two components can
> be overwritten during a core reload.
> Basically, during an update or a core reload, there is a query parser that
> is serving requests while some other component is updating the index.
> So I suppose there should be a pattern, an approach, a common solution when
> a piece of configuration has to be loaded at boot, or when the core is
> loaded.
> Or when, after an update a new searcher is created and a new query parser
> is created.
>
> Any suggestion is really appreciated.
>
> Best regards,
> Vincenzo
>
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Unexpected results using Block Join Parent Query Parser

2020-06-24 Thread Mikhail Khludnev
Jan, thanks for the clarification.
Sure you can use {!parent which=class:section} for return children, which
has a garndchildren matching subordinate query.
Note: there's something about named scopes, which I didn't get into yet,
but it might be relevant to the problem.

On Wed, Jun 24, 2020 at 1:43 PM Jan Høydahl  wrote:

> I guess the key question here is whether «parent» in BlockJoin is strictly
> top-level parent/root, i.e. class:composition for the example in this
> tread? Or can {!parent} parser also be used to select the «child» level in
> a child/grandchild relationship inside a block?
>
> Jan
>
> > 24. jun. 2020 kl. 11:36 skrev Tor-Magne Stien Hagen :
> >
> > Thanks for your answer,
> >
> > What kind of rules exists for the which clause? In other words, how can
> you identify parents without using some sort of filtering?
> >
> > Tor-Magne Stien Hagen
> >
> > -Original Message-
> > From: Mikhail Khludnev 
> > Sent: Wednesday, June 24, 2020 10:01 AM
> > To: solr-user 
> > Subject: Re: Unexpected results using Block Join Parent Query Parser
> >
> > Hello,
> >
> > Please check warning box titled Using which
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Fother-parsers.html%23block-join-parent-query-parserdata=02%7C01%7Ctsh%40dips.no%7C5fef4e9a68cc41c72fd208d81814e93e%7C2f46c9197c11446584b2e354fb809979%7C0%7C0%7C637285825378470570sdata=rB356EBZuDmFsTHT3ULcvr47zCcr%2F29XYaGA7%2BJ5HrI%3Dreserved=0
> >
> > On Wed, Jun 24, 2020 at 10:01 AM Tor-Magne Stien Hagen 
> wrote:
> >
> >> Hi,
> >>
> >> I have indexed the following nested document in Solr:
> >>
> >> {
> >>"id": "1",
> >>"class": "composition",
> >>"children": [
> >>{
> >>"id": "2",
> >>"class": "section",
> >>"children": [
> >>{
> >>"id": "3",
> >>"class": "observation"
> >>}
> >>]
> >>},
> >>{
> >>"id": "4",
> >>"class": "section",
> >>"children": [
> >>{
> >>"id": "5",
> >>"class": "instruction"
> >>}
> >>]
> >>}
> >>]
> >> }
> >>
> >> Given the following query:
> >>
> >> {!parent which='id:4'}id:3
> >>
> >> I expect the result to be empty as document 3 is not a child document
> >> of document 4.
> >>
> >> To reproduce, use the docker container provided here:
> >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >> ub.com%2Ftormsh%2FSolr-Exampledata=02%7C01%7Ctsh%40dips.no%7C5fef
> >> 4e9a68cc41c72fd208d81814e93e%7C2f46c9197c11446584b2e354fb809979%7C0%7C
> >> 0%7C637285825378470570sdata=OyjBalFeXfb0W2euL76L%2BNyRDg9ukvT8TNI
> >> aODCmV30%3Dreserved=0
> >>
> >> Have I misunderstood something regarding the Block Join Parent Query
> >> Parser?
> >>
> >> Tor-Magne Stien Hagen
> >>
> >>
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Unexpected results using Block Join Parent Query Parser

2020-06-24 Thread Mikhail Khludnev
Hello,

Please check warning box titled Using which
https://lucene.apache.org/solr/guide/8_5/other-parsers.html#block-join-parent-query-parser

On Wed, Jun 24, 2020 at 10:01 AM Tor-Magne Stien Hagen  wrote:

> Hi,
>
> I have indexed the following nested document in Solr:
>
> {
> "id": "1",
> "class": "composition",
> "children": [
> {
> "id": "2",
> "class": "section",
> "children": [
> {
> "id": "3",
> "class": "observation"
> }
> ]
> },
> {
> "id": "4",
> "class": "section",
> "children": [
> {
> "id": "5",
> "class": "instruction"
> }
> ]
> }
> ]
> }
>
> Given the following query:
>
> {!parent which='id:4'}id:3
>
> I expect the result to be empty as document 3 is not a child document of
> document 4.
>
> To reproduce, use the docker container provided here:
> https://github.com/tormsh/Solr-Example
>
> Have I misunderstood something regarding the Block Join Parent Query
> Parser?
>
> Tor-Magne Stien Hagen
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: eDismax query syntax question

2020-06-15 Thread Mikhail Khludnev
Hello.
Not sure if it's useful or relevant, I encountered another problem with
parentheses (braces) in eDisMax recently
https://issues.apache.org/jira/browse/SOLR-14557.

On Mon, Jun 15, 2020 at 5:01 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> Markus,
> Thanks, for the reference, but that doesn't answer my question. If - is a
> special character, it's not consistently special. In my example
> "3-DIMETHYL" behaves quite differently than ")-PYRIMIDINE".  If I escape
> the closing parenthesis the following minus no longer behaves specially.
> The referred article does not even mention parenthesis, but it changes the
> behavior of the following "-" if it is escaped. In "3-DIMETHYL" the minus
> is not special.
>
> These all fix the problem:
> 1,3-DIMETHYL-5-(3-PHENYL-ALLYLIDENE\)-PYRIMIDINE-2,4,6-TRIONE
> 1,3-DIMETHYL-5-(3-PHENYL-ALLYLIDENE)\-PYRIMIDINE-2,4,6-TRIONE
> 1,3-DIMETHYL-5-\(3-PHENYL-ALLYLIDENE\)-PYRIMIDINE-2,4,6-TRIONE
>
> Only the minus following the parenthesis is treated as a NOT.
> Are parentheses special? They're not mentioned in the eDismax
> documentation.
>
> -Original Message-
> From: Markus Jelsma 
> Sent: Saturday, June 13, 2020 4:57 AM
> To: solr-user@lucene.apache.org
> Subject: RE: eDismax query syntax question
>
> Hello,
>
> These are special characters, if you don't need them, you must escape them.
>
> See top of the article:
>
> https://lucene.apache.org/solr/guide/8_5/the-extended-dismax-query-parser.html
>
> Markus
>
>
>
>
> -Original message-
> > From:Webster Homer 
> > Sent: Friday 12th June 2020 22:09
> > To: solr-user@lucene.apache.org
> > Subject: eDismax query syntax question
> >
> > Recently we found strange behavior in a query. We use eDismax as the
> query parser.
> >
> > This is the query term:
> > 1,3-DIMETHYL-5-(3-PHENYL-ALLYLIDENE)-PYRIMIDINE-2,4,6-TRIONE
> >
> > It should hit one document in our index. It does not. However, if you
> use the Dismax query parser it does match the record.
> >
> > The problem seems to involve the parenthesis and the dashes. If you
> > escape the dash after the parenthesis it matches
> > 1,3-DIMETHYL-5-(3-PHENYL-ALLYLIDENE)\-PYRIMIDINE-2,4,6-TRIONE
> >
> > I thought that eDismax and Dismax escaped all lucene special characters
> before passing the query to lucene. Although I also remember reading that +
> and - can have special significance in a query if preceded with white
> space. I can find very little documentation on either query parser in how
> they work.
> >
> > Is this expected behavior or is this a bug? If expected, where can I
> find documentation?
> >
> >
> >
> > This message and any attachment are confidential and may be privileged
> or otherwise protected from disclosure. If you are not the intended
> recipient, you must not copy this message or attachment or disclose the
> contents to any other person. If you have received this transmission in
> error, please notify the sender immediately and delete the message and any
> attachment from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
> >
> >
> >
> > Click http://www.merckgroup.com/disclaimer to access the German,
> French, Spanish and Portuguese versions of this disclaimer.
> >
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: using solr to extarct keywords from a long text?

2020-06-10 Thread Mikhail Khludnev
Hello, David.

>From the code I noticing that MoreLikeThisHandler consumes request body
when there's no ?q= and analyzes it for doing what are you asking for. I
see that ref guide obscured this feature.

On Wed, Jun 10, 2020 at 4:37 PM David Zimmermann 
wrote:

> Dear solr community
>
> I’m supposed to extract keywords from long texts. I do have a solr index
> with a lot of documents from the same domain as my texts. So, I was
> wondering if I can use solr to extract those keywords. Ideally I would want
> to use the TF-IDF basd “importantTerms” from the “more like this” function,
> but without indexing the text first. Is there a way to run a more like this
> query not based on a document id, but on a a text supplied by the query? Or
> is there another way to achieve my goal?
>
> I have also been looking into using the /stream handler, but the solr core
> is set up as standalone and not in cloud mode.
>
> Best
> David



-- 
Sincerely yours
Mikhail Khludnev


Re: index join without query criteria

2020-06-08 Thread Mikhail Khludnev
or probably -director_id:[* TO *]

On Mon, Jun 8, 2020 at 10:56 PM Hari Iyer  wrote:

> Hi,
>
> It appears that a query criteria is mandatory for a join. Taking this
> example from the documentation: fq={!join from=id fromIndex=movie_directors
> to=director_id}has_oscar:true. What if I want to find all movies that have
> a director (regardless of whether they have won an Oscar or not)? This
> query: fq={!join from=id fromIndex=movie_directors to=director_id} fails.
> Do I just have to make up a dummy criteria like fq={!join from=id
> fromIndex=movie_directors to=director_id}id:[* TO *]?
>
> Thanks,
> Hari.
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: which terms are used at the matched document?

2020-06-03 Thread Mikhail Khludnev
This is matching term: "ht(*content:banka* in 71"

On Wed, Jun 3, 2020 at 5:15 PM Serkan KAZANCI  wrote:

> Dobry den Mikhail,
>
> So I searched for "banka" which means "bank" at my language. Below is
> highlighted fragments of a matched document. You can see from mark tags
> that "Bankalar", "banka", "bankaya", "bankalar" terms exist in document,
>
>
> "highlighting":{
> "/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm
> ":{
>   "content":["Anlamda Bankalar Arası Mevduat
> Sayılamayacağı ) \n\n • TÜRKİYEDEKİ BİR BANKANIN
> YURTDIŞINDAKİ BANKAYA PARA YATIRMASI ( Banka ve
> Sigorta ",
> "anlamında kurulmuş bir banka olarak
> değerlendirilmesine ve davacı Banka tarafından yurt dışındaki
> bankaya yatırılan mevduatın da bankalar arası
> mevduat ",
> "anlamında kurulmuş bir banka olarak
> değerlendirilmesine ve davacı Banka tarafından yurt dışındaki
> bankaya yatırılan mevduatın da bankalar arası
> mevduat "]},
>
>
>
> Below is debug-explain part of the response about the same document, how
> or where should I read the variations matched term "banka" ? ("Bankalar",
> "bankaya" )
>
>
> "explain":{
>   "/var/www/vhosts/deneme.biz/httpdocs/kho3/ibb/files/7d-2000-4267.htm
> ":{
> "match":true,
> "value":2.6295655,
> "description":"max of:",
> "details":[{
> "match":true,
> "value":2.6295655,
> "description":"weight(content:banka in 7179)
> [SchemaSimilarity], result of:",
> "details":[{
> "match":true,
> "value":2.6295655,
> "description":"score(freq=58.0), computed as boost * idf *
> tf from:",
> "details":[{
> "match":true,
> "value":2.6807382,
> "description":"idf, computed as log(1 + (N - n + 0.5)
> / (n + 0.5)) from:",
> "details":[{
> "match":true,
> "value":3361,
> "description":"n, number of documents containing
> term"},
>   {
> "match":true,
> "value":49063,
> "description":"N, total number of documents with
> field"}]},
>   {
> "match":true,
> "value":0.980911,
> "description":"tf, computed as freq / (freq + k1 * (1
> - b + b * dl / avgdl)) from:",
> "details":[{
> "match":true,
> "value":58.0,
> "description":"freq, occurrences of term within
> document"},
>   {
> "match":true,
> "value":1.2,
> "description":"k1, term saturation parameter"},
>   {
> "match":true,
> "value":0.75,
> "description":"b, length normalization parameter"},
>   {
> "match":true,
> "value":664.0,
> "description":"dl, length of field (approximate)"},
>   {
> "match":true,
> "value":721.1222,
> "description":"avgdl, average length of
> field"}]}]}]}]},
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Wednesday, June 3, 2020 4:39 PM
> To: solr-user
> Subject: Re: which terms are used at the matched document?
>
> Hi,
> debugQuery response contains matched terms as well. It's just a little bit
> hard to read.
>
> On Wed, Jun 3, 2020 at 3:55 PM Serkan KAZANCI 
> wrote:
>
> > Hi,
> >
> >
> >
> > Is it possible to retrieve the terms that are used to match the document?
> > (Keyword term itself, stemmed versions of term, term matched from
> > synonyms.txt)
> >
> >
> >
> > Example:  search keyword "heaven"
> >
> >
> >
> > Found in document1 via "heavens" and "heaven", found in document2 via
> > "heavenly" , found in document3 via "paradise" (because of synonyms.txt)
> >
> >
> >
> > I looked into debug mode but I believe it returns information about the
> > ranking calculation.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Serkan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: which terms are used at the matched document?

2020-06-03 Thread Mikhail Khludnev
Hi,
debugQuery response contains matched terms as well. It's just a little bit
hard to read.

On Wed, Jun 3, 2020 at 3:55 PM Serkan KAZANCI  wrote:

> Hi,
>
>
>
> Is it possible to retrieve the terms that are used to match the document?
> (Keyword term itself, stemmed versions of term, term matched from
> synonyms.txt)
>
>
>
> Example:  search keyword "heaven"
>
>
>
> Found in document1 via "heavens" and "heaven", found in document2 via
> "heavenly" , found in document3 via "paradise" (because of synonyms.txt)
>
>
>
> I looked into debug mode but I believe it returns information about the
> ranking calculation.
>
>
>
> Thanks,
>
>
>
> Serkan
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Lucene query to Solr query

2020-05-31 Thread Mikhail Khludnev
There's nothing like this now. Presumably one might visit queries and
generate Query DSL json, but it might be a challenging problem.

On Sun, May 31, 2020 at 3:42 AM gnandre  wrote:

> I think this question here in this thread is similar to my question.
>
> https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html
>
>
> As suggested in that thread, I do not want to use toString method for
> Lucene query to pass it to the q param in SolrQuery.
>
> I am looking for a function that accepts org.apache.lucene.search.Query and
> returns org.apache.solr.client.solrj.SolrQuery. Is that possible?
>
> On Sat, May 30, 2020 at 8:08 AM Erick Erickson 
> wrote:
>
> > edismas is quite different from straight Lucene.
> >
> > Try attaching =query to the input and
> > you’ll see the difference.
> >
> > Best,
> > Erick
> >
> > > On May 30, 2020, at 12:32 AM, gnandre  wrote:
> > >
> > > Hi,
> > >
> > > I have following query which works fine as a lucene query:
> > > +(topics:132)^0.02607211 (topics:146)^0.008187325
> > > -asset_id:doc:en:index.html
> > >
> > > But, it does not work if I use it as a solr query with lucene as
> defType.
> > >
> > > For it to work, I need to convert it like following:
> > > q=+((topics:132)^0.02607211 (topics:146)^0.008187325
> > > +(-(asset_id:doc\:en\:index.html))=edismax=OR
> > >
> > > Why does it not work as is? AFAIK syntax given in the first query is
> > > supported by edismax.
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version

2020-05-23 Thread Mikhail Khludnev
; > > Please refer below details for query, logs and thead dump (generate
> from Solr Admin while execute query).
> > >
> > > Query :
> https://drive.google.com/file/d/1bavCqwHfJxoKHFzdOEt-mSG8N0fCHE-w/view
> > >
> > > Logs and Thread dump stack trace
> > > Solr 8.5.1 :
> https://drive.google.com/file/d/149IgaMdLomTjkngKHrwd80OSEa1eJbBF/view
> > > Solr 6.1.0 :
> https://drive.google.com/file/d/13v1u__fM8nHfyvA0Mnj30IhdffW6xhwQ/view
> > >
> > > To analyse further more we found that if we remove grouping field or
> we reduce no. of ids from query it execute fast. Is anything change in
> 8.5.1 version compare to 6.1.0 as in 6.1.0 even for large no. Ids along
> with grouping it works faster?
> > >
> > > Can someone please help to isolate this issue.
> > >
> > > Regards,
> > > Jay Harkhani.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Use Subquery Parameters to filter main query

2020-05-22 Thread Mikhail Khludnev
Hello, Rodrigo.
I don'd fully understand your question but the only thing you can do is
group.q=members:6, nothing like using something from subquery in main one
is not possible.
Please clarify your question.

On Fri, May 22, 2020 at 12:21 AM rantonana 
wrote:

> Hello, I need to do the following:
> I have a main query who define a subquery called group with  "fields":
> "*,group:[subquery]",
> the group document has a lot of fields, but I want to filter the main query
> based on one of them.
> ex:
> {
> PID:1,
> type:doc,
>  "group":{"numFound":1,"start":0,"docs":[
> {
> members:[1,2,3]
> }]
> },
> {
> PID:2,
> type:doc,
>  "group":{"numFound":1,"start":0,"docs":[
> {
> members:[4,5,6]
> }]
> }
>
> in the example, I want to filter type documents where members field has the
> 6 value.
>
> thanks
>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Fetch related documents from Custom Function

2020-05-18 Thread Mikhail Khludnev
Hello,
It sounds either like classic denormalization or (little bit slow and
cumbersome) result transformer [subquery].

On Mon, May 18, 2020 at 4:04 PM mganeshs  wrote:

> Is there a easy possibility of reading the few field from related documents
> from Custom function ?
>
> For ex, Project document contains, project id, project name, Project
> manager
> id  ( which is nothing but employee id ). & Employee document contains
> field
> ( Employee id, Employee name ). Now while querying the Project documents,
> in
> a custom function want to pass project manager id, and would like to read
> employee document of that Project manager and return employee name of that
> project manager.
>
> WE can do Join, but for various reason, for me Join won't work. So would
> like to read the employee document from the custom function. As Custom
> function is getting executed inside SOLR, what's the easy to read the other
> documents in SOLR, instead of establishing new connection via solrj and
> read
> it.
>
> Thanks in advance.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solrcloud 6.6 becomes nuts

2020-05-17 Thread Mikhail Khludnev
Hello, Dominique.
What did it log? Which exception?
Do you have a chance to review heap dump? What did consume whole heap?

On Sun, May 17, 2020 at 11:05 AM Dominique Bejean 
wrote:

> Hi,
>
> I have a six node Solrcoud that suddenly has its six nodes failed with OOM
> at the same time.
> This can happen even when the Solrcloud is not under heavy load and there
> is no indexing.
>
> I do not see any raison for this to happen. Here are the description of the
> issue. Thank you for your suggestions and advices.
>
>
> One or two hours before the nodes stop with OOM, we see this scenario on
> all six nodes during the same five minutes time frame :
> * a little bit more young gc : from one each second (duration<0.05secs) to
> one each two or three seconds (duration <0.15 sec)
> * full gc start occurs each 5sec with 0 bytes reclaimed
> * young gc start reclaim less bytes
> * long full gc start reclaim bytes but with less and less reclaimed bytes
> * then no more young GC
> Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png
>
>
> Just before the problem occurs :
> * there is no more requests per seconds
> * no update/commit/merge
> * CPU usage and load are low
> * disk I/O are low
> After the problem starts, requests become longer and longer but still no
> increase of CPU usage or disk I/O
>
>
> During last issue, we dumped the threads on one node just before OOM but
> unfortunately, more than one hour after the problem starts.
> 85% of threads (more than 3000) are BLOCKED and related to log4j
> Solr either try to log slow query or try to log problems in requesthandler
> at org.apache.solr.common.SolrException.log(SolrException.java:148)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:204)
>
> This high count of BLOCKED threads is more a consequence than a cause. We
> will dump threads each minute until the next issue.
>
>
> About Solr environment :
> * Solr 6.6
> * Java Oracle 1.8.0_112 25.112-b15
>
> * 1 collection with 10 millions small documents
> * 3 shards x 2 replicas
> * 3.5 millions docs per core
> * 90 Gb index size per core
>
> * Server with 6 processors and 90 Gb of RAM
> * Swappiness set to 1, nearly no swap used
> * 4Gb Heap used nearly between 25 to 60% before young GC and one full GC (3
> seconds) each 15 to 30 minutes when all is fine.
>
> * Default JVM settings with CMS GC
> * JMX enabled
> * Average Request per seconds in pic on one core : 170, but during the last
> issue the Average Request per seconds was 30 !!!
> * Average Time per seconds : < 30 ms
>
> About updates :
> * Very few add/updates in general
> * Some deleteByQuery (nearly 2000 per day) but not before the problem
> occurs
> * autocommit maxTime:15000ms
>
> About queries :
> * Queries are standard queries or suggesters
> * Queries generate facets but there is no fields with very high number of
> unique values
> * No grouping
> * High usage of function query for relevance computing
>
>
> Thank you.
>
> Dominique
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Filtering large amount of values

2020-05-17 Thread Mikhail Khludnev
On Sun, May 17, 2020 at 4:57 PM Rudenko, Artur 
wrote:

> Hi Mikhail,
>
> Thank you for the help, with you suggestion we actually managed to improve
> the results.
>
> We now get and store the docValues in this method instead of inside
> collect() method:
>
> @Override
> protected void doSetNextReader(LeafReaderContext context) throws
> IOException {
> super.doSetNextReader(context);
> sortedDocValues = DocValues.getSorted(context.reader(),
> FileFilterPostQuery.this.metaField);
> }
>
> We see a big improvement. Is this the most efficient way?
>
Who knows...

Since it's a post filter, we have to return "false" in getCache method. Is
> there a way to implement it with cache?
>
if getCache()==true this query will be used as standalone query ignoring
filterCollector. In this case retrieved docs will be cached.


> Thanks,
> Artur Rudenko
>
> -Original Message-
> From: Mikhail Khludnev 
> Sent: Thursday, May 14, 2020 2:57 PM
> To: solr-user 
> Subject: Re: Filtering large amount of values
>
> Hi, Artur.
>
> Please, don't tell me that you obtain docValues per every doc? It's deadly
> slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related
> problem.
> Make sure you obtain them once per segment, when leaf reader is injected.
> Recently there are some new method(s) for {!terms} I'm wondering if any of
> them might solve the problem.
>
> On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur 
> wrote:
>
> > Hi,
> > We have a requirement of implementing a boolean filter with up to 500k
> > values.
> >
> > We took the approach of post filter.
> >
> > Our environment has 7 servers of 128gb ram and 64cpus each server. We
> > have 20-40m very large documents. Each solr instance has 64 shards
> > with 2 replicas and JVM memory xms and xmx set to 31GB.
> >
> > We are seeing that using single post filter with 1000 on 20m documents
> > takes about 4.5 seconds.
> >
> > Logic in our collect method:
> > numericDocValues =
> > reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
> >
> > if (numericDocValues != null &&
> > numericDocValues.advanceExact(docNumber)) {
> > longVal = numericDocValues.longValue();
> > } else {
> > return;
> > }
> > }
> >
> > if (numericValuesSet.contains(longVal)) {
> > super.collect(docNumber);
> > }
> >
> >
> > Is it the best we can get?
> >
> >
> > Thanks,
> > Artur Rudenko
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread Mikhail Khludnev
It seems this thread is doing heavy work, mind the bottom line.

202.8013ms
124.8008ms
qtp153245266-156 (156)
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.(BM25Similarity.java:219)
org.apache.lucene.search.similarities.BM25Similarity.scorer(BM25Similarity.java:192)
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.scorer(PerFieldSimilarityWrapper.java:47)
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:74)
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:205)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:63)
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:531)
org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.postCollect(TopGroupsFieldCommand.java:178)
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:168)
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1403)
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)


It seems like it ranks groups by query score, that doubtful thing to do.

>From the log. Here's how to recognize query running 25 sec "QTime=25063"


Query itself q=+msg_id:(10519539+10519540+10523575+10523576+ ... is
not what search engines are made for. They are purposed for short
query.

You may

1. leverage {!terms} query parser which might handle such long terms
list more efficiently

2. make sure you don't enable unnecessary grouping features, eg group
ranking in the stack above makes no sense for this kind of query


It's worth to revamp an overall approach in favor of query time
{!join} or index time join see {!parent}/nested docs.



On Sat, May 16, 2020 at 1:46 PM vishal patel 
wrote:

> Thanks for reply.
>
> I have taken a thread dump at the time of query execution. I do not know
> the thread name so send the All threads. I have also send the logs so you
> can get idea.
>
> Thread Dump All Stack Trace:
> https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
> Solr 8.3 shard 1 log:
> https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
> Solr 8.3 shard 2 log:
> https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view
>
> I have some questions regarding the thread dump
> - How can I know the my thread name from thread dump? can I get from the
> log?
> - When do I take a thread dump? on query execution or after query
> execution?
>
> Note: I got a thread name from log and checked in thread dump on query
> execution time and after query executed. Both time thread stack trace got
> different.
>
> If any other things are required then let me know I will send.
>
> Regards,
> Vishal Patel
> 
> From: Mikhail Khludnev 
> Sent: Saturday, May 16, 2020 2:23 PM
> To: solr-user 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
> 34 seconds? Please share the deepest thread stack. This might give a clue
> what's going on there.
>
> On Sat, May 16, 2020 at 11:46 AM vishal patel <
> vishalpatel200...@outlook.com>
> wrote:
>
> > Any one is looking my issue? Please help me.
> >
> > Sent from Outlook<http://aka.ms/weboutlook>
> > 
> > From: vishal patel 
> > Sent: Friday, May 15, 2020 3:06 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
> >
> > I have result of query debug for both version so It will helpful.
> >
> > Solr 6.1 query debug URL
> > https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> > Solr 8.3.1 query debug URL
> > https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view
> >
> > I indexed same data in both version.
> >
> > I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result
> > of Solr 8.6.1. Is there any impact of score in query execution? why is
> > score=1.0 in result of Solr 8.3.0?
> >
> > Regards,
> > Vishal Patel
> > 
> > From: vishal patel 
> > Sent: Thursday, May 14, 2020 7:39 PM
> > To: solr-user@lucene.a

Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread Mikhail Khludnev
Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
34 seconds? Please share the deepest thread stack. This might give a clue
what's going on there.

On Sat, May 16, 2020 at 11:46 AM vishal patel 
wrote:

> Any one is looking my issue? Please help me.
>
> Sent from Outlook<http://aka.ms/weboutlook>
> 
> From: vishal patel 
> Sent: Friday, May 15, 2020 3:06 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I have result of query debug for both version so It will helpful.
>
> Solr 6.1 query debug URL
> https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> Solr 8.3.1 query debug URL
> https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view
>
> I indexed same data in both version.
>
> I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result
> of Solr 8.6.1. Is there any impact of score in query execution? why is
> score=1.0 in result of Solr 8.3.0?
>
> Regards,
> Vishal Patel
> 
> From: vishal patel 
> Sent: Thursday, May 14, 2020 7:39 PM
> To: solr-user@lucene.apache.org 
> Subject: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.
>
> I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1
> when values of one field is large in query and group field is apply.
>
> My Solr URL :
> https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
> My Solr config and schema :
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>
>
> It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5
> seconds in Solr 6.1.0.
>
> Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?
>
>
> Regards,
> Vishal Patel
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Filtering large amount of values

2020-05-14 Thread Mikhail Khludnev
Hi, Artur.

Please, don't tell me that you obtain docValues per every doc? It's deadly
slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related
problem.
Make sure you obtain them once per segment, when leaf reader is injected.
Recently there are some new method(s) for {!terms} I'm wondering if any of
them might solve the problem.

On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur 
wrote:

> Hi,
> We have a requirement of implementing a boolean filter with up to 500k
> values.
>
> We took the approach of post filter.
>
> Our environment has 7 servers of 128gb ram and 64cpus each server. We have
> 20-40m very large documents. Each solr instance has 64 shards with 2
> replicas and JVM memory xms and xmx set to 31GB.
>
> We are seeing that using single post filter with 1000 on 20m documents
> takes about 4.5 seconds.
>
> Logic in our collect method:
> numericDocValues =
> reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
>
> if (numericDocValues != null &&
> numericDocValues.advanceExact(docNumber)) {
> longVal = numericDocValues.longValue();
> } else {
> return;
> }
> }
>
> if (numericValuesSet.contains(longVal)) {
> super.collect(docNumber);
> }
>
>
> Is it the best we can get?
>
>
> Thanks,
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.5.1 query timeAllowed exceeded throws exception

2020-05-13 Thread Mikhail Khludnev
While java.lang.NullPointerException seems odd. Overall system behavior
seems sane. Overloaded system might not accept incoming connections, and it
triggers exception on the client side.
Overall, please add more details, like serverside logs or so, so far it's
not clear.

On Wed, May 13, 2020 at 1:37 AM Phill Campbell
 wrote:

> Upon examining the Solr source code it appears that it was unable to even
> make a connection in the time allowed.
> While the error message was a bit confusing, I do understand what it means.
>
>
> > On May 12, 2020, at 2:08 PM, Phill Campbell
>  wrote:
> >
> >
> >
> > org.apache.solr.client.solrj.SolrServerException: Time allowed to handle
> this request exceeded:…
> >   at
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:345)
> >   at
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143)
> >   at
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906)
> >   at
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838)
> >   at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> >   at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
> > ...
> >   at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
> >   at
> java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java)
> >   at javax.swing.SwingWorker.run(SwingWorker.java:334)
> >   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >   at java.lang.Thread.run(Thread.java:748)
> > Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://10.156.112.50:10001/solr/BTS:
> java.lang.NullPointerException
> >
> >   at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:665)
> >   at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
> >   at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
> >   at
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
> >   at
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
> >
> >
> > The timeAllowed is set to 8 seconds. I am using a StopWatch to verify
> that the round trip was greater than 8 seconds.
> >
> > Documentation states:
> >
> > timeAllowed Parameter
> > This parameter specifies the amount of time, in milliseconds, allowed
> for a search to complete. If this time expires before the search is
> complete, any partial results will be returned, but values such as
> numFound, facet counts, and result stats may not be accurate for the entire
> result set. In case of expiration, if omitHeader isn’t set to true the
> response header contains a special flag called partialResults.
> >
> > I do not believe I should be getting an exception.
> >
> > I am load testing so I am intentionally putting pressure on the system.
> >
> > Is this the correct behavior to throw an exception?
> >
> > Regards.
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread Mikhail Khludnev
Hello, James.

DataImportHandler has a lock preventing concurrent execution. If you need
to run several imports in parallel at the same core, you need to duplicate
"/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
in parallel. Regarding schema, I prefer the latter but mileage may vary.

--
Mikhail.

On Tue, May 5, 2020 at 6:39 PM James Greene 
wrote:

> Hello, I'm new to the group here so please excuse me if I do not have the
> etiquette down yet.
>
> Is it possible to have multiple entities (customer configurable, up to 40
> atm) in a DIH configuration to be imported at once?  Right now I have
> multiple root entities in my configuration but they get indexes
> sequentially and this means the entities that are last are always delayed
> hitting the index.
>
> I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> different collection for each "entity type" into a single collection (solr
> 8.4) to get around some of the hurdles faced when needing to have searches
> that require multiple block joins and currently does not work going cross
> core.
>
> I'm also wondering if it is better to fully qualify a field name or use two
> different fields for performing the "same" search.  i.e:
>
>
> {
> type_A_status; Active
> type_A_value: Test
> }
> vs
> {
> type: A
> status: Active
> value: Test
> }
>


-- 
Sincerely yours
Mikhail Khludnev


Re: off-heap OOM

2020-05-01 Thread Mikhail Khludnev
I don't know exactly, but couldn't it hit host-wide threads limit
limitation?

On Fri, May 1, 2020 at 11:02 AM Raji N  wrote:

> Thanks for your  reply . Sure will take a look at the docker host log.  But
> even when we got "unable to create new native thread" error , the heap dump
> taken within hour before (we have hourly heap generation) the OOM did not
> have more than 150 to 160 threads. So it doesn't look like it happens due
> to running out of threads. Rather suspecting it happens because there is no
> native memory?.
>
> Thanks,
> Raji
>
> On Fri, May 1, 2020 at 12:13 AM Mikhail Khludnev  wrote:
>
> > > java.lang.OutOfMemoryError: unable to create new native thread
> > Usually mean code flaw, but there is a workaround to trigger heap GC.
> > It happens when app creates threads instead of proper pooling, and no GC
> > occurs, so java Thread objects hangs in heap in stopped state, but every
> of
> > them holds a native thread handler; and system run out of native threads
> > sooner or later. So, in this case reducing heap size, frees native thread
> > and app is able to recycle them. But you are right, it's rather better to
> > disable it.
> > Also, check docker host log, there's a specific error message for java
> > under docker.
> >
> > On Fri, May 1, 2020 at 3:55 AM Raji N  wrote:
> >
> > > It used to occur every 3 days ,we reduced heap and it started
> > > occurring every 5 days .  From the logs we can't get much. Some times
> we
> > > see "unable to create  new native thread" in the logs and many times no
> > > exceptions .
> > > When it says "unable to create native thread" error , we got below
> > > exceptions as we use cdcr. To eliminate cdcr from this issue , we
> > disabled
> > > CDCR also. But we still get OOM.
> > >
> > >  WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
> > > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> > >
> > > java.lang.OutOfMemoryError: unable to create new native thread
> > >
> > >at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> > >
> > >at java.lang.Thread.start(Thread.java:717)
> ~[?:1.8.0_211]
> > >
> > >at
> > >
> > >
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > > ~[httpclient-4.5.3.jar:4.5.3]
> > >
> > >at
> > >
> > >
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > > ~[httpclient-4.5.3.jar:4.5.3]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > > - nknize - 2018-12-07 14:47:53]
> > >
> > >at
> > >
> > >
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > > [solr-core-7.6.0

Re: off-heap OOM

2020-05-01 Thread Mikhail Khludnev
> java.lang.OutOfMemoryError: unable to create new native thread
Usually mean code flaw, but there is a workaround to trigger heap GC.
It happens when app creates threads instead of proper pooling, and no GC
occurs, so java Thread objects hangs in heap in stopped state, but every of
them holds a native thread handler; and system run out of native threads
sooner or later. So, in this case reducing heap size, frees native thread
and app is able to recycle them. But you are right, it's rather better to
disable it.
Also, check docker host log, there's a specific error message for java
under docker.

On Fri, May 1, 2020 at 3:55 AM Raji N  wrote:

> It used to occur every 3 days ,we reduced heap and it started
> occurring every 5 days .  From the logs we can't get much. Some times we
> see "unable to create  new native thread" in the logs and many times no
> exceptions .
> When it says "unable to create native thread" error , we got below
> exceptions as we use cdcr. To eliminate cdcr from this issue , we disabled
> CDCR also. But we still get OOM.
>
>  WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
>at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
>
>at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
>
>at
>
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> ~[httpclient-4.5.3.jar:4.5.3]
>
>at
>
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> ~[httpclient-4.5.3.jar:4.5.3]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
>
>at
>
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> 14:02:46]
>
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_211]
>
>at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [?:1.8.0_211]
>
>        at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_211]
>
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_211]
>
> Thanks,
> Raji
> On Thu, Apr 30, 2020 at 12:24 AM Mikhail Khludnev  wrote:
>
> > Raji, how that "OOM for solr occur in every 5 days." exactly looks like?
> > What is the error message? Where it's occurring exactly?
> >
> > On Thu, Apr 30, 2020 at 1:30 AM Raji N  wrote:
> >
> > > Thanks so much Jan. Will try your suggestions , yes we are also running
> > > solr inside docker.
> > >
> > > Thanks,
> > > Raji
> > >
> > > On Wed, Apr 29, 2020 at 1:46 PM Jan Høydahl 
> > wrote:
> > >
> > > > I have seen th

Re: off-heap OOM

2020-04-30 Thread Mikhail Khludnev
Raji, how that "OOM for solr occur in every 5 days." exactly looks like?
What is the error message? Where it's occurring exactly?

On Thu, Apr 30, 2020 at 1:30 AM Raji N  wrote:

> Thanks so much Jan. Will try your suggestions , yes we are also running
> solr inside docker.
>
> Thanks,
> Raji
>
> On Wed, Apr 29, 2020 at 1:46 PM Jan Høydahl  wrote:
>
> > I have seen the same, but only in Docker.
> > I think it does not relate to Solr’s off-heap usage for filters and other
> > data structures, but rather how Docker treats memory-mapped files as
> > virtual memory.
> > As you know, when using MMapDirectoryFactory, you actually let Linux
> > handle the loading and unloading of the index files, and Solr will access
> > them as if they were in a huge virtual memory pool. Naturally the index
> > files grow large, and there is something strange going on in the way
> Docker
> > handles this, leading to OOM, not for Java heap but for the process.
> >
> > I have no definitive answer, but so far my research has found a few
> > possible settings
> >
> > Set env.var MALLOC_ARENA_MAX=2
> > Try to limit -XX:MaxDirectMemorySize
> > Lower mem swappiness in Docker (--memory-swappiness 0)
> > More generic insight into java mem allocation in Docker:
> > https://dzone.com/articles/native-memory-allocation-in-examples
> >
> > Have not yet found a silver bullet, so very interested in this thread.
> >
> > Jan
> >
> > > 29. apr. 2020 kl. 19:26 skrev Raji N :
> > >
> > > Thank you for your reply.  When OOM happens somehow it doesn't generate
> > > dump file. So we have hourly heaps running to diagnose this issue. Heap
> > is
> > > around 700MB and threads around 150. But 29GB of native memory is used
> > up,
> > > it is consumed by java.io.DirectBufferR (27GB major consumption) and
> > > java.io.DirectByteBuffer  objects .
> > >
> > > We use solr 7.6.0 in solrcloud mode and OS is alpine . Java version
> > >
> > > java -version
> > >
> > > Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
> > >
> > > java version "1.8.0_211"
> > >
> > > Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> > >
> > > Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
> > >
> > >
> > >
> > > Thanks much for taking a look at it.
> > >
> > > Raji
> > >
> > >
> > >
> > > On Wed, Apr 29, 2020 at 10:04 AM Shawn Heisey 
> > wrote:
> > >
> > >> On 4/29/2020 2:07 AM, Raji N wrote:
> > >>> Has anyone encountered off-heap OOM. We are thinking of reducing heap
> > >>> further and increasing the hardcommit interval . Any other
> > suggestions? .
> > >>> Please share your thoughts.
> > >>
> > >> It sounds like it's not heap memory that's running out.
> > >>
> > >> When the OutOfMemoryError is logged, it will also contain a message
> > >> mentioning which resource ran out.
> > >>
> > >> A common message that might be logged with the OOME is "Unable to
> create
> > >> native thread".  This type of error, if that's what's happening,
> > >> actually has nothing at all to do with memory, OOME is just how Java
> > >> happens to report it.
> > >>
> > >> You will need to know exactly which resource is running out before we
> > >> can offer any assistance.
> > >>
> > >> If the OOME is logged, the message you're looking for will be in the
> > >> solr log, not the tiny special log that is created when Solr is killed
> > >> by an OOME.  What version of Solr are you running, and what OS is it
> > >> running on?
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Using QT param with /select

2020-03-10 Thread Mikhail Khludnev
Hello, Atita.

My question here is that on Solr 6.2.6 to enable using 'qt' param I need to
> do handleSelect=false

Can you elaborate on that? What exactly happens?  Also, please clarify
whether you use SolrCloud or standalone?


On Mon, Mar 2, 2020 at 7:37 PM Atita Arora  wrote:

> Hi,
>
> I am working on improving the search app which is using 'qt' param heavily
> to redirect requests to different handlers based on the parameters as
> provided by the user.
>
> Also for A B testing of different configurations, we have used qt param to
> send request to different handlers.
> My question here is that on Solr 6.2.6 to enable using 'qt' param I need to
> do handleSelect=false but it is the default request handler on solr
> administration UI and used as the default endpoint in all the integration
> tests.
>
> It may sound weird but is there a way I can retain both the
> functionalities?
> No code changes to integration test code and making qt param work again.
>
> Big thanks for any pointers !!
>
> Sincerely,
> Atita
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Possible performance bug - JSON facet - numBuckets:true

2020-03-10 Thread Mikhail Khludnev
Hello, Artur.

Thanks for your interest.
Perhaps, we can amend doc mentioning this effect. In long term it can be
optimized by adding a proper condition. Both patches are welcome.

On Wed, Feb 12, 2020 at 10:48 PM Rudenko, Artur 
wrote:

> Hello everyone,
> I'm am currently investigating a performance issue in our environment and
> it looks like we found a performance bug.
> Our environment:
> 20M large PARENT documents and 800M nested small CHILD documents.
> The system inserts about 400K PARENT documents and 16M CHILD documents per
> day. (Currently we stopped the calls insertion to investigate the
> performance issue)
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM
> each, 24GB allocated to Solr) with single collection (32 shards and
> replication factor 2).
>
> The below query runs in about 14-16 seconds (we have to use limit:-1 due
> to a business case - cardinality is 1K values).
>
> fq=channel:345133
> =content_type:PARENT
> =Meta_is_organizationIds:(344996998 344594999 34501 total of
> int 562 values)
> =*:*
> ={
> "Chart_01_Bins":{
> type:terms,
> field:groupIds,
> mincount:1,
> limit:-1,
> numBuckets:true,
> missing:false,
> refine:true,
> facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
> }
> },
> "Chart_01_FIELD_NOT_EXISTS":{
> type:query,
> q:"-groupIds:[* TO *]",
> facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
> }
> }
> }
> =0
>
> Also, when the facet is simplified, it takes about 4-6 seconds
>
> fq=channel:345133
> =content_type:PARENT
> =Meta_is_organizationIds:(344996998 344594999 34501 total of
> int 562 values)
> =*:*
> ={
> "Chart_01_Bins":{
> type:terms,
> field:groupIds,
> mincount:1,
> limit:-1,
> numBuckets:true,
> missing:false,
> refine:true
> }
> }
> =0
>
> Schema relevant fields:
>
> 
> 
>
> 
>  required="true" multiValued="false" />
>
> 
>  required="true" multiValued="false" />
>
> 
>  required="false" multiValued="true" />
>
> 
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
>
> 
>  multiValued="true" />
>
>
>
> I noticed that when we set numBuckets:false, the result returns faster
> (1.5-3.5 seconds less) - that sounds like a performance bug:
> The limit is -1, which means all bucks, so adding about significant time
> to the overall time just to get number of buckets when we will get all of
> them anyway doesn't seems to be right.
>
> Any thoughts?
>
>
> Thanks
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Ordering in Nested Document

2020-02-24 Thread Mikhail Khludnev
You may try. Content-type should be absolutely the same across parents and
child-free. It may work now.
Earlier, mixing blocks and childfrees in one index wasn't supported.

On Mon, Feb 24, 2020 at 2:57 AM Gajendra Dadheech 
wrote:

> That extra s was intentional, should have added a better name.
>
> So ideally we shouldn't have childfree and blocks together while indexing?
> Or in the whole index they shouldn't be together, i.e. We should have
> atleast one child doc for all if any of doc has one?
>
> On Mon, Feb 24, 2020 at 4:24 PM Mikhail Khludnev  wrote:
>
> > Hello, Gajendra.
> > Pics doesn't come through mailing list.
> > May it caused by unnecessary s  *s*
> > parentDocument?
> > At least earlier mixing childfrees and blocks wasn't allowed, and caused
> > some troubles. Usually, child stub used to keep childfrees in the index.
> >
> > On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech 
> > wrote:
> >
> > > Hi
> > >
> > > i want to ingest below documents, where there is a mix of nested and
> > > un-nested documents:
> > > 
> > >   
> > >   5
> > >   5
> > >   5Solr adds block join support
> > >   sparentDocument
> > >   
> > >  
> > >   1
> > >   1
> > >   Solr adds block join support
> > >   parentDocument
> > >   
> > >   2
> > >   1
> > >   SolrCloud supports it too!
> > >   childDocument
> > >   
> > >   
> > >   
> > >   3
> > >   3
> > >   New Lucene and Solr release is out
> > >   parentDocument
> > >   
> > > 4
> > > 4
> > > Lots of new features
> > >     childDocument
> > >   
> > >   
> > > 
> > >
> > >
> > > Output of block join query after ingesting above docs:
> > > [image: image.png]
> > >
> > > So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
> > > believ Id-5 should be a different document tree.
> > >
> > > Shall I Ingest them in some order ?
> > >
> > >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Ordering in Nested Document

2020-02-24 Thread Mikhail Khludnev
Hello, Gajendra.
Pics doesn't come through mailing list.
May it caused by unnecessary s  *s*
parentDocument?
At least earlier mixing childfrees and blocks wasn't allowed, and caused
some troubles. Usually, child stub used to keep childfrees in the index.

On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech 
wrote:

> Hi
>
> i want to ingest below documents, where there is a mix of nested and
> un-nested documents:
> 
>   
>   5
>   5
>   5Solr adds block join support
>   sparentDocument
>   
>  
>   1
>   1
>   Solr adds block join support
>   parentDocument
>   
>   2
>   1
>   SolrCloud supports it too!
>   childDocument
>   
>   
>   
>   3
>   3
>   New Lucene and Solr release is out
>   parentDocument
>   
> 4
> 4
> Lots of new features
> childDocument
>   
>   
> 
>
>
> Output of block join query after ingesting above docs:
> [image: image.png]
>
> So doc id 5 is getting linked to doc id 1. Is this expected behavior, I
> believ Id-5 should be a different document tree.
>
> Shall I Ingest them in some order ?
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Mikhail Khludnev
Note, {!terms} query is more efficient for long ids list. I'd try to group
ids by boost, and cache long ids lists. Something like:
q=filter({!terms f=id}1,3,5)^=100  filter({!terms f=id}2,4,6)^=-1
Thus, it let to reuse heavy terms lists between queries.
Another idea, extract boost score to the separate core/index (strictly
single shard in SolrCloud so far), and use {!join score=sum} to bring ranks
to the main index. It let to update smaller core faster. Although, it might
require some hack to decouple updates and cache invalidation.
Also, Solr has in-place updates, which might update columns with boosts and
score by this column.

On Tue, Feb 18, 2020 at 9:27 PM Ashwin Ramesh 
wrote:

> ping on this :)
>
> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh  wrote:
>
> > Hi,
> >
> > We are in the process of applying a scoring model to our search results.
> > In particular, we would like to add scores for documents per query and
> user
> > context.
> >
> > For example, we want to have a score from 500 to 1 for the top 500
> > documents for the query “dog” for users who speak US English.
> >
> > We believe it becomes infeasible to store these scores in Solr because we
> > want to update the scores regularly, and the number of scores increases
> > rapidly with increased user attributes.
> >
> > One solution we explored was to store these scores in a secondary data
> > store, and use this at Solr query time with a boost function such as:
> >
> > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > mul(termfreq(id,'ID-500'),1)`
> >
> > We have over a hundred thousand documents in one Solr collection, and
> > about fifty million in another Solr collection. We have some queries for
> > which roughly 80% of the results match, although this is an edge case. We
> > wanted to know the worst case performance, so we tested with such a
> query.
> > For both of these collections we found the a message similar to the
> > following in the Solr cloud logs (tested on a laptop):
> >
> > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> >
> > We then tried using the following boost, which seemed simpler:
> >
> > `boost=if(query($qq), 10, 1)=id:(ID-1 OR ID-2 OR … OR ID-500)`
> >
> > We then saw the following in the Solr cloud logs:
> >
> > `The request took too long to iterate over terms.`
> >
> > All responses above took over 5000 milliseconds to return.
> >
> > We are considering Solr’s re-ranker, but I don’t know how we would use
> > this without pushing all the query-context-document scores to Solr.
> >
> >
> > The alternative solution that we are currently considering involves
> > invoking multiple solr queries.
> >
> > This means we would make a request to solr to fetch the top N results
> (id,
> > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> limit=N.
> >
> > Another request would be made using a filter query with a set of doc ids
> > that we know are high value for the user’s query. E.g. q=*:*,
> > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> >
> > We would then do a reranking phase in our service layer.
> >
> > Do you have any suggestions for known patterns of how we can store and
> > retrieve scores per user context and query?
> >
> > Regards,
> > Ash & Spirit.
> >
>
> --
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're
> hiring. Apply here! <https://about.canva.com/careers/>
>
> <https://twitter.com/canva> <https://facebook.com/canva>
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> <https://instagram.com/canva>
>
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: A question about solr filter cache

2020-02-17 Thread Mikhail Khludnev
Hello,
The former
https://github.com/apache/lucene-solr/blob/188f620208012ba1d726b743c5934abf01988d57/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L84
More efficient sets (roaring and/or elias-fano, iirc) present in Lucene,
but not yet being used in Solr.

On Mon, Feb 17, 2020 at 1:13 AM Hongxu Ma  wrote:

> Hi
> I want to know the internal of solr filter cache, especially its memory
> usage.
>
> I googled some pages:
> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html
> (Erick Erickson's answer)
>
> All of them said its structure is: fq => a bitmap (total doc number bits),
> but I think it's not so simple, reason:
> Given total doc number is 1 billion, each filter cache entry will use
> nearly 1GB(10/8 bit), it's too big and very easy to make solr OOM
> (I have a 1 billion doc cluster, looks it works well)
>
> And I also checked solr node, but cannot find the details (only saw using
> DocSets structure)
>
> So far, I guess:
>
>   *   degenerate into an doc id array/list when the bitmap is sparse
>   *   using some compressed bitmap, e.g. roaring bitmaps
>
> which one is correct? or another answer, thanks you very much!
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Slow quires and facets

2020-02-15 Thread Mikhail Khludnev
I don't have any particular idea. Probably it's worth to start from
learning debugQuery=true output. There are caveats:
- it takes a while
- it's worth to limit shards to a few ones
- it used to produce incorrect json, and worked in only in wt=xml
At least it let to sneak something about the longest part of computation.
Few other thoughts: output with 1K entries doesn't seem like a regular
search engine response, usually results are scrolled with limit/offset, but
anyway it looks like analytical job for spark.


On Wed, Feb 12, 2020 at 11:32 PM Rudenko, Artur 
wrote:

> Hello everyone,
> I'm am currently investigating a performance issue in our environment:
> 20M large PARENT documents and 800M nested small CHILD documents.
> The system inserts about 400K PARENT documents and 16M CHILD documents per
> day. (Currently we stopped the calls insertion to investigate the
> performance issue)
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM
> each, 24GB allocated to Solr) with single collection (32 shards and
> replication factor 2).
>
> We experience generally slow queries (about 4-7 seconds) and facet times.
> The below query runs in about 14-16 seconds (we have to use limit:-1 due to
> a business case - cardinality is 1K values).
>
> fq=channel:345133
> =content_type:PARENT
> =Meta_is_organizationIds:(344996998 344594999 34501 total of
> int 562 values)
> =*:*
> ={
> "Chart_01_Bins":{
> type:terms,
> field:groupIds,
> mincount:1,
> limit:-1,
> numBuckets:true,
> missing:false,
> refine:true,
> facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
> }
> },
> "Chart_01_FIELD_NOT_EXISTS":{
> type:query,
> q:"-groupIds:[* TO *]",
> facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
> }
> }
> }
> =0
>
> Also, when the facet is simplified, it takes about 4-6 seconds
>
> fq=channel:345133
> =content_type:PARENT
> =Meta_is_organizationIds:(344996998 344594999 34501 total of
> int 562 values)
> =*:*
> ={
> "Chart_01_Bins":{
> type:terms,
> field:groupIds,
> mincount:1,
> limit:-1,
> numBuckets:true,
> missing:false,
> refine:true
> }
> }
> =0
>
> Schema relevant fields:
>
> 
> 
>
> 
>  required="true" multiValued="false" />
>
> 
>  required="true" multiValued="false" />
>
> 
>  required="false" multiValued="true" />
>
> 
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
>
> 
>  multiValued="true" />
>
>
> Any suggestions how to proceed with the investigation?
>
> Right now we are trying to figure out if using single shard on each
> machine will help.
> Artur Rudenko
> Analytics Developer
> Customer Engagement Solutions, VERINT
> T +972.74.747.2536 | M +972.52.425.4686
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Karl, what would you do if that own implementation stalls in GC, or smashes
Solr over?

On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney
 wrote:

> Spoke too soon, looks like it memory leaks.  After about 1.3m the old gc
> times went through the root and solr was almost unresponsive, had to
> abort.  We're going to write our own implementation to copy data from one
> core to another that runs outside of solr.
>
> On 06/02/2020, 09:57, "Karl Stoney"  wrote:
>
> I cannot believe how much of a difference that cursorMark and sort
> order made.
> Previously it died about 800k docs, now we're at 1.2m without any
> slowdown.
>
> Thank you so much
>
> On 06/02/2020, 08:14, "Mikhail Khludnev"  wrote:
>
> Hello, Karl.
> Please check these:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fpagination-of-results.html%23constraints-when-using-cursorsdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=pNw8x6YUBTtXst60oMAe8UqWvUtakYvoJ9%2FKn7R8ETo%3Dreserved=0
>
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fuploading-structured-data-store-data-with-the-data-import-handler.html%23solrentityprocessordata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=572w%2Br7QtZ8eHORG5UVrE3yE3SZaUXsuqFpRuwE80sw%3Dreserved=0
>  cursorMark="true"
> Good luck.
>
>
> On Wed, Feb 5, 2020 at 10:06 PM Karl Stoney
>  wrote:
>
> > Hey All,
> > I'm trying to implement a simplistic reindex strategy to copy
> all of the
> > data out of one collection, into another, on a single node (no
> distributed
> > queries).
> >
> > It's approx 4 million documents, with an index size of 26gig.
> Based on
> > your experience, I'm wondering what people feel sensible values
> for the
> > SolrEntityProcessor are (to give me a sensible starting point,
> to save me
> > iterating over loads of them).
> >
> > This is where I'm at right now.  I know `rows` would increase
> memory
> > pressure but speed up the copy, I can't really find anywhere
> online where
> > people have benchmarked different values for rows and the
> default (50)
> > seems quite low.
> >
> > 
> > 
> > >  query="*:*"
> >  rows="100"
> >  fl="*,old_version:_version_"
> >  wt="javabin"
> >  url="
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2F127.0.0.1%2Fsolr%2Fat-ukdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=e9BfXappFygVqSlweYXJdsxf5TXtlrL%2BwHop7PrOsJQ%3Dreserved=0
> ">
> >
> > 
> > 
> >
> > Any suggestions are welcome.
> > Thanks
> > This e-mail is sent on behalf of Auto Trader Group Plc,
> Registered Office:
> > 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered
> in England
> > No. 9439967). This email and any files transmitted with it are
> confidential
> > and may be legally privileged, and intended solely for the use
> of the
> > individual or entity to whom they are addressed. If you have
> received this
> > email in error please notify the sender. This email message has
> been swept
> > for the presence of computer viruses.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>
>
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Egor, would you mind to share some best practices regarding cursorMark in
SolrEntityProcessor?

On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney
 wrote:

> Spoke too soon, looks like it memory leaks.  After about 1.3m the old gc
> times went through the root and solr was almost unresponsive, had to
> abort.  We're going to write our own implementation to copy data from one
> core to another that runs outside of solr.
>
> On 06/02/2020, 09:57, "Karl Stoney"  wrote:
>
> I cannot believe how much of a difference that cursorMark and sort
> order made.
> Previously it died about 800k docs, now we're at 1.2m without any
> slowdown.
>
> Thank you so much
>
> On 06/02/2020, 08:14, "Mikhail Khludnev"  wrote:
>
> Hello, Karl.
> Please check these:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fpagination-of-results.html%23constraints-when-using-cursorsdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=pNw8x6YUBTtXst60oMAe8UqWvUtakYvoJ9%2FKn7R8ETo%3Dreserved=0
>
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fuploading-structured-data-store-data-with-the-data-import-handler.html%23solrentityprocessordata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=572w%2Br7QtZ8eHORG5UVrE3yE3SZaUXsuqFpRuwE80sw%3Dreserved=0
>  cursorMark="true"
> Good luck.
>
>
> On Wed, Feb 5, 2020 at 10:06 PM Karl Stoney
>  wrote:
>
> > Hey All,
> > I'm trying to implement a simplistic reindex strategy to copy
> all of the
> > data out of one collection, into another, on a single node (no
> distributed
> > queries).
> >
> > It's approx 4 million documents, with an index size of 26gig.
> Based on
> > your experience, I'm wondering what people feel sensible values
> for the
> > SolrEntityProcessor are (to give me a sensible starting point,
> to save me
> > iterating over loads of them).
> >
> > This is where I'm at right now.  I know `rows` would increase
> memory
> > pressure but speed up the copy, I can't really find anywhere
> online where
> > people have benchmarked different values for rows and the
> default (50)
> > seems quite low.
> >
> > 
> > 
> > >  query="*:*"
> >  rows="100"
> >  fl="*,old_version:_version_"
> >  wt="javabin"
> >  url="
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2F127.0.0.1%2Fsolr%2Fat-ukdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457sdata=e9BfXappFygVqSlweYXJdsxf5TXtlrL%2BwHop7PrOsJQ%3Dreserved=0
> ">
> >
> > 
> > 
> >
> > Any suggestions are welcome.
> > Thanks
> > This e-mail is sent on behalf of Auto Trader Group Plc,
> Registered Office:
> > 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered
> in England
> > No. 9439967). This email and any files transmitted with it are
> confidential
> > and may be legally privileged, and intended solely for the use
> of the
> > individual or entity to whom they are addressed. If you have
> received this
> > email in error please notify the sender. This email message has
> been swept
> > for the presence of computer viruses.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>
>
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Hello, Karl.
Please check these:
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#constraints-when-using-cursors

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor
 cursorMark="true"
Good luck.


On Wed, Feb 5, 2020 at 10:06 PM Karl Stoney
 wrote:

> Hey All,
> I'm trying to implement a simplistic reindex strategy to copy all of the
> data out of one collection, into another, on a single node (no distributed
> queries).
>
> It's approx 4 million documents, with an index size of 26gig.  Based on
> your experience, I'm wondering what people feel sensible values for the
> SolrEntityProcessor are (to give me a sensible starting point, to save me
> iterating over loads of them).
>
> This is where I'm at right now.  I know `rows` would increase memory
> pressure but speed up the copy, I can't really find anywhere online where
> people have benchmarked different values for rows and the default (50)
> seems quite low.
>
> 
> 
>  query="*:*"
>  rows="100"
>  fl="*,old_version:_version_"
>  wt="javabin"
>  url="http://127.0.0.1/solr/at-uk;>
>
> 
> 
>
> Any suggestions are welcome.
> Thanks
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Number of requested rows

2020-02-05 Thread Mikhail Khludnev
Hi, Emir.

Please check callers of org.apache.lucene.search.HitQueue.HitQueue(int,
boolean), you may found an alternative usage you probably is looking for.

On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović 
wrote:

> Hi Mikhail,
> I was thinking in that direction. Do you know where it is in the codebase
> or which structure is used - I am guessing some array of objects?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Feb 2020, at 12:54, Mikhail Khludnev  wrote:
> >
> > Absolutely. Searcher didn't know number of hits a priory. It eagerly
> > allocate results heap before collecting results. The only cap I'm aware
> of
> > is maxDocs.
> >
> > On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi,
> >> Does somebody know if requested number of rows is used internally to set
> >> some temp structures? In other words will query with rows=100 be
> more
> >> expensive than query with rows=1000 if number of hits is 1000?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Number of requested rows

2020-02-05 Thread Mikhail Khludnev
Absolutely. Searcher didn't know number of hits a priory. It eagerly
allocate results heap before collecting results. The only cap I'm aware of
is maxDocs.

On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović 
wrote:

> Hi,
> Does somebody know if requested number of rows is used internally to set
> some temp structures? In other words will query with rows=100 be more
> expensive than query with rows=1000 if number of hits is 1000?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Exact search in Solr

2020-02-04 Thread Mikhail Khludnev
Hello, Łukasz
The later for sure.

On Tue, Feb 4, 2020 at 12:44 PM Antczak, Lukasz
 wrote:

> Hi, Solr experts!
>
> I would like to learn from you if there is a better solution for doing
> 'exact search' in Solr.
> Exact search means no analysis for the text other then tokenization. Query
> "secret" gives back only documents containing exactly "secret" not
> "secrets", "secrection", etc.  Text that needs to be searched is content of
> some articles.
>
> Solution 1. - index whole text as string, use regex for searching.
> Solution 2. - index text with just tokenization, no lowercase, stemming,
> etc.
>
> Which solution will be faster? Any other clever ideas to be evaluated?
>
> Regards
> Łukasz Antczak
> --
> *Łukasz Antczak*
> Senior IT Professional
> GS Data Frontiers Team <http://go.roche.com/bigs>
>
> *Planned absences:*
> *Roche Polska Sp. z o.o.*
> ADMD Group Services - Business Intelligence Team
> HQ: ul. Domaniewska 39B, 02-672 Warszawa
> Office: ul. Abpa Baraniaka 88D, 61-131 Poznań
>
> Mobile: +48 519 515 010
> mailto: lukasz.antc...@roche.com
>
> *Informacja o poufności: *Treść tej wiadomości zawiera informacje
> przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej
> adresatem, bądź otrzymaliście ją przez pomyłkę, prosimy o powiadomienie o
> tym nadawcy oraz trwałe jej usunięcie. Wszelkie nieuprawnione
> wykorzystanie informacji zawartych w tej wiadomości jest zabronione.
>
> *Confidentiality Note:* This message is intended only for the use of the
> named recipient(s) and may contain confidential and/or proprietary
> information. If you are not the intended recipient, please contact the
> sender and delete this message. Any unauthorized use of the information
> contained in this message is prohibited.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Early termination in Lucene 8

2020-01-30 Thread Mikhail Khludnev
He, Wei.
Feel free to followup https://issues.apache.org/jira/browse/SOLR-13289

On Thu, Jan 23, 2020 at 11:14 PM Wei  wrote:

> Thanks Mikhail.  Do you know of any example on query parser with WAND?
>
> On Thu, Jan 23, 2020 at 1:02 AM Mikhail Khludnev  wrote:
>
> > If one creates query parser wrapping queries with WAND it just produce
> > incomplete docset (I guess), which will be passed to facet component and
> > produce fewer counts.
> >
> > On Thu, Jan 23, 2020 at 2:11 AM Wei  wrote:
> >
> > > Hi,
> > >
> > > I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
> > > improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My
> > > question
> > > is, how does it integrate with facet request,  when the numFound won't
> be
> > > exact? I did some search but haven't found any documentation on this.
> Any
> > > pointer is greatly appreciated.
> > >
> > > Best,
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr fact response strange behaviour

2020-01-29 Thread Mikhail Khludnev
t; at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_201]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_201]
> at
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
>
> -Original Message-
> From: Jason Gerlowski 
> Sent: Wednesday, January 29, 2020 5:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr fact response strange behaviour
>
> Hey Adi,
>
> There was a separate JIRA for this on the SolrJ objects it sounds like
> you're using: SOLR-13780.  That JIRA was fixed, apparently in 8.3, so I'm
> surprised you're still seeing the issue.  If you include the full
> stacktrace and a snippet of code to reproduce, I'm curious to take a look.
>
> That won't help you in the short term though.  For that, yes, you'll have
> to use ((Number)count).longValue() in the interim.
>
> Best,
>
> Jason
>
> On Tue, Jan 28, 2020 at 2:20 AM Kaminski, Adi 
> wrote:
> >
> > Thanks Mikhail  !
> >
> > In issue comments that you have shared it seems that Yonik S doesn't
> agree it's a defect...so probably will remain opened for a while.
> >
> >
> >
> > So meanwhile, is it recommended to perform casting
> ((Number)count).longValue()  to our relevant logic ?
> >
> >
> >
> > Thanks,
> > Adi
> >
> >
> >
> > -Original Message-
> > From: Mikhail Khludnev 
> > Sent: Tuesday, January 28, 2020 9:14 AM
> > To: solr-user 
> > Subject: Re: Solr fact response strange behaviour
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-11775
> >
> >
> >
> > On Tue, Jan 28, 2020 at 10:00 AM Kaminski, Adi
> > mailto:adi.kamin...@verint.com>>
> >
> > wrote:
> >
> >
> >
> > > Is it existing issue and tracked for future fix consideration ?
> >
> > >
> >
> > > What's the suggestion as W/A until fix - to case every related
> >
> > > response with ((Number)count).longValue() ?
> >
> > >
> >
> > > -Original Message-
> >
> > > From: Mikhail Khludnev mailto:m...@apache.org>>
> >
> > > Sent: Tuesday, January 28, 2020 8:53 AM
> >
> > > To: solr-user
> > > mailto:solr-user@lucene.apache.org>>
> >
> > > Subject: Re: Solr fact response strange behaviour
> >
> > >
> >
> > > I suppose there's an issue, which no one ever took a look.
> >
> > >
> >
> > > https://lucene.472066.n3.nabble.com/JSON-facets-count-a-long-or-an-i
> > > nt
> >
> > > eger-in-cloud-and-non-cloud-modes-td4265291.html
> >
> > >
> >
> > >
> >
> > > On Mon, Jan 27, 2020 at 11:47 PM Kaminski, Adi
> >
> > > mailto:adi.kamin...@verint.com>>
> >
> > > wrote:
> >
> > >
> >
> > > > SolrJ client is used of SolrCloud of Solr 8.3 version for JSON
> >
> > > > Facets requests...any idea why not consistent ?
> >
> > > >
> >
> > > > Sent from Workspace ONE Boxer
> >
> > > >
> >
> > > > On Jan 27, 2020 22:13, Mikhail Khludnev  m...@apache.org>> wrote:
> >
> > > > Hello,
> >
> > > > It might be different between SolrCloud and standalone mode. No
> > > > data
> >
> > > > enough to make a conclusion.
> >
> > > >
> >
> > > > On Mon, Jan 27, 2020 at 5:40 PM Rudenko, Artur
> >
> > > > mailto:artur.rude...@verint.com>>
> >
> > > > wrote:
> >
> > > >
> >
> > > > > I'm trying to parse facet response, but sometimes the count
> >
> > > > > returns as Long type and sometimes as Integer type(on different
> >
> > > > > environments), The error is:
> >
> > > > > "java.lang.ClassCastException: java.lang.Integer cannot be cast
> > > > > to
> >
> > > > > java.lang.Long"
> >
> > > > >
> >
> > > > > Can you please explain why this happenes? Why it not consistent?
> >
> > > > >
> >
> > > > > I know the workaround to use Number class and longValue method
> > > > > but
> >
> > > > > I want to to the root cause before using t

Re: Query Regarding SOLR cross collection join

2020-01-29 Thread Mikhail Khludnev
It's time to enforce and document field type constraints
https://issues.apache.org/jira/browse/SOLR-14230.

On Mon, Jan 27, 2020 at 4:12 PM Doss  wrote:

> @ Alessandro Benedetti , Thanks for your input!
>
> @ Mikhail Khludnev , I made docValues="true" for from & to and did a index
> rotation, now the score join works perfectly!  Saw 7x performance increase.
> Thanks!
>
>
> On Thu, Jan 23, 2020 at 9:53 PM Mikhail Khludnev  wrote:
>
> > On Wed, Jan 22, 2020 at 4:27 PM Doss  wrote:
> >
> > > HI,
> > >
> > > SOLR version 8.3.1 (10 nodes), zookeeper ensemble (3 nodes)
> > >
> > > Read somewhere that the score join parser will be faster, but for me it
> > > produces no results. I am using string type fields for from and to.
> > >
> >
> > That's odd. Can you try to enable docValues on from side and reindex
> small
> > portion of data just to check if it works.
> >
> >
> > >
> > >
> > > Thanks!
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr fact response strange behaviour

2020-01-27 Thread Mikhail Khludnev
https://issues.apache.org/jira/browse/SOLR-11775

On Tue, Jan 28, 2020 at 10:00 AM Kaminski, Adi 
wrote:

> Is it existing issue and tracked for future fix consideration ?
>
> What's the suggestion as W/A until fix - to case every related response
> with ((Number)count).longValue() ?
>
> -Original Message-----
> From: Mikhail Khludnev 
> Sent: Tuesday, January 28, 2020 8:53 AM
> To: solr-user 
> Subject: Re: Solr fact response strange behaviour
>
> I suppose there's an issue, which no one ever took a look.
>
> https://lucene.472066.n3.nabble.com/JSON-facets-count-a-long-or-an-integer-in-cloud-and-non-cloud-modes-td4265291.html
>
>
> On Mon, Jan 27, 2020 at 11:47 PM Kaminski, Adi 
> wrote:
>
> > SolrJ client is used of SolrCloud of Solr 8.3 version for JSON Facets
> > requests...any idea why not consistent ?
> >
> > Sent from Workspace ONE Boxer
> >
> > On Jan 27, 2020 22:13, Mikhail Khludnev  wrote:
> > Hello,
> > It might be different between SolrCloud and standalone mode. No data
> > enough to make a conclusion.
> >
> > On Mon, Jan 27, 2020 at 5:40 PM Rudenko, Artur
> > 
> > wrote:
> >
> > > I'm trying to parse facet response, but sometimes the count returns
> > > as Long type and sometimes as Integer type(on different
> > > environments), The error is:
> > > "java.lang.ClassCastException: java.lang.Integer cannot be cast to
> > > java.lang.Long"
> > >
> > > Can you please explain why this happenes? Why it not consistent?
> > >
> > > I know the workaround to use Number class and longValue method but I
> > > want to to the root cause before using this workaround
> > >
> > > Artur Rudenko
> > >
> > >
> > >
> > > This electronic message may contain proprietary and confidential
> > > information of Verint Systems Inc., its affiliates and/or subsidiaries.
> > The
> > > information is intended to be for the use of the individual(s) or
> > > entity(ies) named above. If you are not the intended recipient (or
> > > authorized to receive this e-mail for the intended recipient), you
> > > may
> > not
> > > use, copy, disclose or distribute to anyone this message or any
> > information
> > > contained in this message. If you have received this electronic
> > > message
> > in
> > > error, please notify us by replying to this e-mail.
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr fact response strange behaviour

2020-01-27 Thread Mikhail Khludnev
I suppose there's an issue, which no one ever took a look.
https://lucene.472066.n3.nabble.com/JSON-facets-count-a-long-or-an-integer-in-cloud-and-non-cloud-modes-td4265291.html


On Mon, Jan 27, 2020 at 11:47 PM Kaminski, Adi 
wrote:

> SolrJ client is used of SolrCloud of Solr 8.3 version for JSON Facets
> requests...any idea why not consistent ?
>
> Sent from Workspace ONE Boxer
>
> On Jan 27, 2020 22:13, Mikhail Khludnev  wrote:
> Hello,
> It might be different between SolrCloud and standalone mode. No data enough
> to make a conclusion.
>
> On Mon, Jan 27, 2020 at 5:40 PM Rudenko, Artur 
> wrote:
>
> > I'm trying to parse facet response, but sometimes the count returns as
> > Long type and sometimes as Integer type(on different environments), The
> > error is:
> > "java.lang.ClassCastException: java.lang.Integer cannot be cast to
> > java.lang.Long"
> >
> > Can you please explain why this happenes? Why it not consistent?
> >
> > I know the workaround to use Number class and longValue method but I want
> > to to the root cause before using this workaround
> >
> > Artur Rudenko
> >
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or subsidiaries.
> The
> > information is intended to be for the use of the individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> not
> > use, copy, disclose or distribute to anyone this message or any
> information
> > contained in this message. If you have received this electronic message
> in
> > error, please notify us by replying to this e-mail.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr fact response strange behaviour

2020-01-27 Thread Mikhail Khludnev
Hello,
It might be different between SolrCloud and standalone mode. No data enough
to make a conclusion.

On Mon, Jan 27, 2020 at 5:40 PM Rudenko, Artur 
wrote:

> I'm trying to parse facet response, but sometimes the count returns as
> Long type and sometimes as Integer type(on different environments), The
> error is:
> "java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long"
>
> Can you please explain why this happenes? Why it not consistent?
>
> I know the workaround to use Number class and longValue method but I want
> to to the root cause before using this workaround
>
> Artur Rudenko
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.0 Json Facets are slow - need help

2020-01-23 Thread Mikhail Khludnev
Hello, Kumar.

I don't know. 3 / 84 ratio seems reasonable. The only unknown part of the
equation was that {!simpleFilter}. Anyway, profiler/sampler might get exact
answer.

On Fri, Jan 24, 2020 at 8:55 AM kumar gaurav  wrote:

> HI Mikhail
>
> Can you please see above debug log and help ?
>
> Thanks
>
>
> On Thu, Jan 23, 2020 at 12:05 AM kumar gaurav  wrote:
>
> > Also
> >
> > its not looks like box is slow . because for following query prepare time
> > is 3 ms but facet time is 84ms on the same box .Don't know why prepare
> time
> > was huge for that example :( .
> >
> > debug:
> > {
> >
> >- rawquerystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- querystring:
> >"{!parent tag=top which=$pq filters=$child.fq score=max v=$cq}",
> >- parsedquery:
> >"AllParentsAware(ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
> #color_refine:Blue #size_refine:L))"
> >,
> >- parsedquery_toString:
> >"ToParentBlockJoinQuery (+(+docType:sku +store_873:1)
> #color_refine:Blue #size_refine:L)"
> >,
> >- explain:
> >{
> >   - 1729659: "
> >   2.0 = Score based on 2 child docs in range from 5103808 to
> 5104159, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4059732)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(size_refine:L in 4059732)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm ",
> >   - 1730320: "
> >   2.0 = Score based on 1 child docs in range from 5099889 to
> 5100070, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4055914)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(size_refine:L in 4055914)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm ",
> >   - 1730721: "
> >   2.0 = Score based on 4 child docs in range from 5097552 to
> 5097808, best match:
> > 2.0 = sum of: 2.0 = sum of:   1.0 = docType:sku
> > 1.0 = store_873:1 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(color_refine:Blue in 4053487)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >   1 = docFreq, number of documents containing term
> >   1 = docCount, total number of documents with field
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = freq, occurrences of term within document
> > 1.0 = fieldNorm 0.0 = match on required clause,
> product of:
> > 0.0 = # clause
> > 1.0 = weight(size_refine:L in 4053487)
> [DisabledStatisticsSimilarity], result of:
> >   1.0 = score(freq=1.0), product of:
> > 1.0 = idf(docFreq, docCount)
> >

Re: Early termination in Lucene 8

2020-01-23 Thread Mikhail Khludnev
Never heard of that.

On Thu, Jan 23, 2020 at 11:14 PM Wei  wrote:

> Thanks Mikhail.  Do you know of any example on query parser with WAND?
>
> On Thu, Jan 23, 2020 at 1:02 AM Mikhail Khludnev  wrote:
>
> > If one creates query parser wrapping queries with WAND it just produce
> > incomplete docset (I guess), which will be passed to facet component and
> > produce fewer counts.
> >
> > On Thu, Jan 23, 2020 at 2:11 AM Wei  wrote:
> >
> > > Hi,
> > >
> > > I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
> > > improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My
> > > question
> > > is, how does it integrate with facet request,  when the numFound won't
> be
> > > exact? I did some search but haven't found any documentation on this.
> Any
> > > pointer is greatly appreciated.
> > >
> > > Best,
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Query Regarding SOLR cross collection join

2020-01-23 Thread Mikhail Khludnev
On Wed, Jan 22, 2020 at 4:27 PM Doss  wrote:

> HI,
>
> SOLR version 8.3.1 (10 nodes), zookeeper ensemble (3 nodes)
>
> Read somewhere that the score join parser will be faster, but for me it
> produces no results. I am using string type fields for from and to.
>

That's odd. Can you try to enable docValues on from side and reindex small
portion of data just to check if it works.


>
>
> Thanks!
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Early termination in Lucene 8

2020-01-23 Thread Mikhail Khludnev
If one creates query parser wrapping queries with WAND it just produce
incomplete docset (I guess), which will be passed to facet component and
produce fewer counts.

On Thu, Jan 23, 2020 at 2:11 AM Wei  wrote:

> Hi,
>
> I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
> improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My
> question
> is, how does it integrate with facet request,  when the numFound won't be
> exact? I did some search but haven't found any documentation on this. Any
> pointer is greatly appreciated.
>
> Best,
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It's hard to predict will it be faster read docValues files or uninvert
field ad-hoc and read them from heap. Only test might judge it.

On Wed, Jan 22, 2020 at 11:08 PM kumar gaurav  wrote:

> HI Mikhail
>
> for example :- 6GB index size (Parent-child documents)
> indexing in 12 hours interval .
>
> need to use uniqueBlock for json facet for child faceting .
>
> Should i use docValues="true" for _root_  field   ?
>
> Thanks .
>
> regards
> Kumar Gaurav
>
>
>
> On Thu, Jan 23, 2020 at 1:28 AM Mikhail Khludnev  wrote:
>
> > It depends from env.
> >
> > On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:
> >
> > > Hi Everyone
> > >
> > > Should i use docValues="true" for _root_  field to improve nested child
> > > json.facet performance  ? i am using uniqueBlock() .
> > >
> > >
> > > Thanks in advance .
> > >
> > > regards
> > > Kumar Gaurav
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Does it make sense docValues="true" for _root_ field for uniqueBlock()

2020-01-22 Thread Mikhail Khludnev
It depends from env.

On Wed, Jan 22, 2020 at 9:31 PM kumar gaurav  wrote:

> Hi Everyone
>
> Should i use docValues="true" for _root_  field to improve nested child
> json.facet performance  ? i am using uniqueBlock() .
>
>
> Thanks in advance .
>
> regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
 -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  - material_refine:
>  {
> - domain:
> {
>- excludeTags: "rassortment,top,top2,top3,top4,",
>- filter:
>[
>   -
>   "{!filters param=$child.fq
> excludeTags=rmaterial_refine v=$sq}"
>   ,
>   -
>   "{!child of=$pq filters=$fq}docType:(product
> collection)",
>   ],
>},
> - type: "terms",
> - field: "material_refine",
> - limit: -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  - ageAppropriate_refine:
>  {
> - domain:
> {
>- excludeTags: "rassortment,top,top2,top3,top4,",
>- filter:
>[
>   -
>   "{!filters param=$child.fq
> excludeTags=rageAppropriate_refine v=$sq}"
>   ,
>   -
>   "{!child of=$pq filters=$fq}docType:(product
> collection)",
>   ],
>},
> - type: "terms",
> - field: "ageAppropriate_refine",
> - limit: -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  - price_refine:
>  {
> - domain:
> {
>- excludeTags: "rassortment,top,top2,top3,top4,",
>- filter:
>[
>   -
>   "{!filters param=$child.fq excludeTags=rprice_refine
> v=$sq}"
>   ,
>   -
>   "{!child of=$pq filters=$fq}docType:(product
> collection)",
>   ],
>},
> - type: "terms",
> - field: "price_refine",
> - limit: -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  - size_refine:
>  {
> - domain:
> {
>- excludeTags: "rassortment,top,top2,top3,top4,",
>- filter:
>[
>   -
>   "{!filters param=$child.fq excludeTags=rsize_refine
> v=$sq}"
>   ,
>   -
>   "{!child of=$pq filters=$fq}docType:(product
> collection)",
>   ],
>},
> - type: "terms",
> - field: "size_refine",
> - limit: -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  - inStoreOnline_refine:
>  {
> - domain:
> {
>- excludeTags: "rassortment,top,top2,top3,top4,",
>- filter:
>[
>   -
>   "{!filters param=$child.fq
> excludeTags=rinStoreOnline_refine v=$sq}"
>   ,
>   -
>   "{!child of=$pq filters=$fq}docType:(product
> collection)",
>   ],
>},
> - type: "terms",
> - field: "inStoreOnline_refine",
> - limit: -1,
> - facet:
> {
>- productsCount: "uniqueBlock(_root_)"
>},
> },
>  }
>   },
>- QParser: "BlockJoinParentQParser",
>- filter_queries:
>[
>   - "{!tag=top2}(*:* -pvgc:true)",
>   - "{!tag=top3}{!query v=$eligibleCollections}",
>   - "{!tag=top3}{!query v=$eligibleCollections}",
>   ],
>- parsed_filter_queries:
>[
>   - "MatchAllDocsQuery(*:*) -pvgc:true",
>   - "docType:product (+docType:collection +(eligibleToShow:[1 TO 1]))",
>   - "docType:product (+docType:collection +(eligibleToShow:[1 TO 1]))",
>   ],
>  

Re: Solr 8.0 Json Facets are slow - need help

2020-01-22 Thread Mikhail Khludnev
Screenshot didn't come though the list. That excerpt doesn't have any
informative numbers.

On Tue, Jan 21, 2020 at 5:18 PM kumar gaurav  wrote:

> Hi Mikhail
>
> Thanks for your reply . Please help me in this .
>
> Followings are the screenshot:-
>
> [image: image.png]
>
>
> [image: image.png]
>
>
> json facet debug Output:-
>
> json:
> {
>
>- facet:
>{
>   - color_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "color_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   - size_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "size_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   }
>
> }
>
>
>
> regards
> Kumar Gaurav
>
>
> On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:
>
>> Hi.
>> Can you share debugQuery=true output?
>>
>> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>>
>> > HI
>> >
>> > i have a parent child query in which i have used json facet for child
>> > faceting like following.
>> >
>> > qt=/dismax
>> > matchAllQueryRef1=+(+({!query v=$cq}))
>> > sq=+{!lucene v=$matchAllQueryRef1}
>> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
>> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
>> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
>> > qcolor_refine1=Blue
>> > qcolor_refine2=Other clrs
>> > cq=+{!simpleFilter v=docType:sku}
>> > pq=docType:(product)
>> > facet=true
>> > facet.mincount=1
>> > facet.limit=-1
>> > facet.missing=false
>> > json.facet= {color_refine:{
>> > domain:{
>> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
>> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>> >},
>> > type:terms,
>> > field:color_refine,
>> > limit:-1,
>> > facet:{productsCount:"uniqueBlock(_root_)"}}}
>> >
>> > schema :-
>> > > > multiValued="true" docValues="true"/>
>> >
>> > i have observed that json facets are slow . It is taking much time than
>> > expected .
>> > Can anyone please check this query specially child.fq and json.facet
>> part .
>> >
>> > Please help me in this .
>> >
>> > Thanks & regards
>> > Kumar Gaurav
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread Mikhail Khludnev
Hi.
Can you share debugQuery=true output?

On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:

> HI
>
> i have a parent child query in which i have used json facet for child
> faceting like following.
>
> qt=/dismax
> matchAllQueryRef1=+(+({!query v=$cq}))
> sq=+{!lucene v=$matchAllQueryRef1}
> q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
> child.fq={!tag=rcolor_refine}filter({!term f=color_refine
> v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
> qcolor_refine1=Blue
> qcolor_refine2=Other clrs
> cq=+{!simpleFilter v=docType:sku}
> pq=docType:(product)
> facet=true
> facet.mincount=1
> facet.limit=-1
> facet.missing=false
> json.facet= {color_refine:{
> domain:{
> filter:["{!filters param=$child.fq excludeTags=rcolor_refine
> v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>},
> type:terms,
> field:color_refine,
> limit:-1,
> facet:{productsCount:"uniqueBlock(_root_)"}}}
>
> schema :-
>  multiValued="true" docValues="true"/>
>
> i have observed that json facets are slow . It is taking much time than
> expected .
> Can anyone please check this query specially child.fq and json.facet part .
>
> Please help me in this .
>
> Thanks & regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How do I add multiple values for same field with DIH script?

2020-01-16 Thread Mikhail Khludnev
Hello.
What about putting Arrays.asList("foo", "bar") ?

On Thu, Jan 16, 2020 at 2:42 PM O. Klein  wrote:

> row.put('content_text', "hello");
> row.put('content_text', "this is a test");
> return row;
>
> will only return "this is a test"
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: JOIN query

2020-01-08 Thread Mikhail Khludnev
Hi, Paresh.

I'm afraid the only way is to join them back in post processing
https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_subquery_
Although, I'm not sure it will ever work with particular collections.

On Wed, Jan 8, 2020 at 3:42 PM Paresh  wrote:

> Hi,
>
> I have two collections: collection1 and collection2
> I have fields like -
> colleciton1: id, prop1, prop2, prop3
> collection2: id, col1, col2, col3
>
> I am doing a join query with collection1.prop1 = collection2.col1 on
> collection2.
>
> As a result, I can get any field from collection2 in 'fl'.
>
> Is there any way to get field from collection1 while performing query from
> collection2 joining with collection1?
>
>
> Regards,
> Paresh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Performance of Bulk Importing TSV File in Solr 8

2020-01-02 Thread Mikhail Khludnev
Hello, Joseph.

This rate looks good to me, although if the node is idling and  has a
plenty of free RAM, you can dissect this file by unix tools and submit
these partitions for import in parallel.
Hanging connection seems like a bug.

On Thu, Jan 2, 2020 at 10:09 PM Joseph Lorenzini  wrote:

> Hi all,
>
> I have TSV file that contains 1.2 million rows. I want to bulk import this
> file into solr where each row becomes a solr document. The TSV has 24
> columns. I am using the streaming API like so:
>
> curl -v '
>
> http://localhost:8983/solr/example/update?stream.file=/opt/solr/results.tsv=%09=%5c=text/csv;charset=utf-8=true
> '
>
> The ingestion rate is 167,000 rows a minute and takes about 7.5 minutes to
> complete. I have a few questions.
>
> - is there a way to increase the performance of the ingestion rate? I am
> open to doing something other than bulk import of a TSV up to and including
> writing a small program. I am just not sure what that would look like at a
> high level.
> - if the file is a TSV, I noticed that solr never closes a HTTP connection
> with a 200 OK after all the documents are uploaded. The connection seems to
> be held open indefinitely. If however, i upload the same file as a CSV,
> then solr does close the http connection. Is this a bug?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Get Solr to notice new core without restarting?

2019-12-13 Thread Mikhail Khludnev
https://lucene.apache.org/solr/guide/8_2/coreadmin-api.html#coreadmin-create


On Fri, Dec 13, 2019 at 10:50 PM Mark H. Wood  wrote:

> I have a product which comes with several empty Solr cores already
> configured and laid out, ready to be copied into place where Solr can
> find them.  Is there a way to get Solr to notice new cores without
> restarting it?  Is it likely there ever will be?  I'm one of the
> people who test and maintain the product, so I'm always creating and
> destroying instances.
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


-- 
Sincerely yours
Mikhail Khludnev


Re: native Thread - solr 8.2.0

2019-12-09 Thread Mikhail Khludnev
My experience with  "OutOfMemoryError: unable to create new native thread"
as follows: it occurs on envs where devs refuse to use threadpools in favor
of old good new Thread().
Then, it turns rather interesting: If there are plenty of heap, GC doesn't
sweep Thread instances. Since they are native in Java, every of them hold
some ram for native stack. That exceeds stack space at some point of time.
So, check how many thread JVM hold after this particular OOME occurs by
jstack; you can even force GC to release that native stack space. Then,
rewrite the app, or reduce heap to enforce  GC.

On Tue, Dec 10, 2019 at 9:44 AM Shawn Heisey  wrote:

> On 12/9/2019 2:23 PM, Joe Obernberger wrote:
> > Getting this error on some of the nodes in a solr cloud during heavy
> > indexing:
>
> 
>
> > Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>
> Java was not able to start a new thread.  Most likely this is caused by
> the operating system imposing limits on the number of processes or
> threads that a user is allowed to start.
>
> On Linux, the default limit is usually 1024 processes.  It doesn't take
> much for a Solr install to need more threads than that.
>
> How to increase the limit will depend on what OS you're running on.
> Typically on Linux, this is controlled by /etc/security/limits.conf.  If
> you're not on Linux, then you'll need to research how to increase the
> process limit.
>
> As long as you're fiddling with limits, you'll probably also want to
> increase the open file limit.
>
> Thanks,
> Shawn
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Excluding a block join filter query during faceting

2019-12-04 Thread Mikhail Khludnev
Why do you think it doesn't work?
={!parent tag=test
which="my_doc_type:Parent"}child_doc_some_field:("30")

On Wed, Dec 4, 2019 at 12:38 AM Srijan  wrote:

> I was wondering if anyone has encountered this problem.
>
> I have a parent block join query to return parent documents when child
> documents are matched.
> Eg:
> q=
> ={!parent which="my_doc_type:Parent"}child_doc_some_field:("30")
>
> I now want to facet on certain parent field but want to exclude the above
> filter query condition entirely. If I had a normal filter query,
> fq={!tag=test}parent_doc_field1:("30") then I could use that as my exclude
> tag while faceting.
> facet.field={!ex=test}parent_doc_field2&...
>
> But turns out I cannot do that with a block join filter query. Is there
> anyway I can achieve this? JSON faceting with domain filter capability will
> probably solve my problem but I cannot use JSON faceting at this point.
>
> Thanks a lot,
>
> Srijan Nepal
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Mikhail Khludnev
and Query DSL as well. Although, it didn't get the point in the topic
starter.

On Mon, Dec 2, 2019 at 9:16 PM Alexandre Rafalovitch 
wrote:

> What about XMLQueryParser:
>
> https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser
>
> Regards,
>Alex.
>
> On Wed, 27 Nov 2019 at 22:43,  wrote:
> >
> > I am trying to simulate the following query(Lucene query builder) using
> Solr
> >
> >
> >
> >
> > BooleanQuery.Builder main = new BooleanQuery.Builder();
> >
> > Term t1 = new Term("f1","term");
> > Term t2 = new Term("f1","second");
> > Term t3 = new Term("f1","another");
> >
> > BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> > q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> > q1.setMinimumNumberShouldMatch(2);
> >
> > Term t4 = new Term("f1","anothert");
> > Term t5 = new Term("f1","anothert2");
> > Term t6 = new Term("f1","anothert3");
> >
> > BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> > q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> > q2.setMinimumNumberShouldMatch(2);
> >
> >
> > main.add(q1.build(),BooleanClause.Occur.SHOULD);
> > main.add(q2.build(),BooleanClause.Occur.SHOULD);
> > main.setMinimumNumberShouldMatch(1);
> >
> > System.out.println(main.build()); // (((f1:term~2 f1:second~2
> > f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1
>  -->
> > Invalid Solr Query
> >
> >
> >
> >
> >
> > In a few words :  ( q1 OR q2 )
> >
> >
> >
> > Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> > search but I also need a minimum of terms to match.
> >
> >
> >
> > The best I was able to create was something like this  :
> >
> >
> >
> > SolrQuery query = new SolrQuery();
> > query.set("fl", "term");
> > query.set("q", "term~1 term2~2 term3~2");
> > query.set("mm",2);
> >
> > System.out.println(query);
> >
> >
> >
> > And I was unable to find any example that would allow me to do the type
> of
> > query that I am trying to build with only one solr query.
> >
> >
> >
> > Is it possible to use the Lucene Query builder with Solr? Is there any
> way
> > to create Boolean queries with Solr? Do I need to build the query as a
> > String? If so , how do I set the mm parameter in a String query?
> >
> >
> >
> > Thank you
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-25 Thread Mikhail Khludnev
It's worth to increase log level for DIH categories in SolrAmin. It's quite
useful usually.

On Mon, Nov 25, 2019 at 10:19 PM Shashank Bellary  wrote:

> That didn't make any difference. However, I upgraded to 8.0.15 version of
> mysql-jdbc driver which fixed the problem of result set closing but then
> data import got really slow and getting stuck without any logs in solr
> logs. I'll debug more from DB perspective, in the meantime any pointers
> would be helpful
>
> On 11/24/19, 3:51 PM, "Mikhail Khludnev"  wrote:
>
> Note - This message originated from outside Care.com - Please use
> caution before opening attachments, clicking on links or sharing
> information.
>
>
> Config makes sense to me. The only unusual thing is tailing semicolons
> in
> SQL statements. What if you drop them?
>
> On Sun, Nov 24, 2019 at 11:30 PM Shashank Bellary 
> wrote:
>
> > Thanks Mikhail, data config is on the thread above. I’ll share again
> if
> > you can’t find it
> >
> > Get Outlook for iOS<https://aka.ms/o0ukef>
> > 
> > From: Mikhail Khludnev 
> > Sent: Sunday, November 24, 2019 2:51:40 PM
> > To: solr-user 
> > Subject: Re: Solr 4 to Solr7 migration DIH behavior change
> >
> > Note - This message originated from outside Care.com - Please use
> caution
> > before opening attachments, clicking on links or sharing information.
> >
> >
> > Hello, Shashank.
> > The error seems similar, but I didin't find an old issue with such
> error.
> > I've found only one abandoned thread in the mailing list. ='17' seems
> > suspicious to me, usually it should be done via prepared statement.
> Have no
> > thoughts, maybe you can share you data config?
> >
> > On Sun, Nov 24, 2019 at 10:40 PM Shashank Bellary  >
> > wrote:
> >
> > > Any thoughts guys? I tried with mysql driver v8 also, still no luck
> > >
> > > On 11/22/19, 3:00 PM, "Jörn Franke"  wrote:
> > >
> > > Note - This message originated from outside Care.com - Please
> use
> > > caution before opening attachments, clicking on links or sharing
> > > information.
> > >
> > >
> > > Did you update the java version to 8? Did you upgrade the MySQL
> > driver
> > > to the latest version?
> > >
> > > > Am 22.11.2019 um 20:43 schrieb Shashank Bellary <
> sbell...@care.com
> > >:
> > > >
> > > >
> > > >
> > > > Hi Folks
> > > > I migrated from Solr 4 to 7.5 and I see an issue with the
> way DIH
> > is
> > > working. I use `JdbcDataSource` and here the config file is
> attached
> > > > 1) I started seeing OutOfMemory issue since MySQL JDBC
> driver has
> > > that issue of not respecting `batchSize` (though Solr4 didn't show
> this
> > > behavior). So, I added `batchSize=-1` for that
> > > > 2) After adding that I'm running into ResultSet closed
> exception as
> > > shown below while fetching the child entity
> > > >
> > > > getNext() failed for query ' SELECT REVIEW AS REVIEWS FROM
> > > SOLR_SITTER_SERVICE_PROFILE_REVIEWS WHERE SERVICE_PROFILE_ID =
> '17' ;
> > > ':org.apache.solr.handler.dataimport.DataImportHandlerException:
> > > java.sql.SQLException: Operation not allowed after ResultSet closed
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:464)
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:133)
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
> > > > at
> > >
> >
> org.apache.solr.handler.dataimport.EntityProcessorWr

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Mikhail Khludnev
Config makes sense to me. The only unusual thing is tailing semicolons in
SQL statements. What if you drop them?

On Sun, Nov 24, 2019 at 11:30 PM Shashank Bellary  wrote:

> Thanks Mikhail, data config is on the thread above. I’ll share again if
> you can’t find it
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
> ____
> From: Mikhail Khludnev 
> Sent: Sunday, November 24, 2019 2:51:40 PM
> To: solr-user 
> Subject: Re: Solr 4 to Solr7 migration DIH behavior change
>
> Note - This message originated from outside Care.com - Please use caution
> before opening attachments, clicking on links or sharing information.
>
>
> Hello, Shashank.
> The error seems similar, but I didin't find an old issue with such error.
> I've found only one abandoned thread in the mailing list. ='17' seems
> suspicious to me, usually it should be done via prepared statement. Have no
> thoughts, maybe you can share you data config?
>
> On Sun, Nov 24, 2019 at 10:40 PM Shashank Bellary 
> wrote:
>
> > Any thoughts guys? I tried with mysql driver v8 also, still no luck
> >
> > On 11/22/19, 3:00 PM, "Jörn Franke"  wrote:
> >
> > Note - This message originated from outside Care.com - Please use
> > caution before opening attachments, clicking on links or sharing
> > information.
> >
> >
> > Did you update the java version to 8? Did you upgrade the MySQL
> driver
> > to the latest version?
> >
> > > Am 22.11.2019 um 20:43 schrieb Shashank Bellary  >:
> > >
> > >
> > >
> > > Hi Folks
> > > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH
> is
> > working. I use `JdbcDataSource` and here the config file is attached
> > > 1) I started seeing OutOfMemory issue since MySQL JDBC driver has
> > that issue of not respecting `batchSize` (though Solr4 didn't show this
> > behavior). So, I added `batchSize=-1` for that
> > > 2) After adding that I'm running into ResultSet closed exception as
> > shown below while fetching the child entity
> > >
> > > getNext() failed for query ' SELECT REVIEW AS REVIEWS FROM
> > SOLR_SITTER_SERVICE_PROFILE_REVIEWS WHERE SERVICE_PROFILE_ID = '17' ;
> > ':org.apache.solr.handler.dataimport.DataImportHandlerException:
> > java.sql.SQLException: Operation not allowed after ResultSet closed
> > > at
> >
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
> > > at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:464)
> > > at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
> > > at
> >
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:133)
> > > at
> >
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
> > > at
> >
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
> > > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
> > > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
> > > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
> > > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:33)
> > > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
> > > at
> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
> > > at
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
> > > at
> >
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
> > > at java.lang.Thread.run(Thread.java:748)
> > > Caused by: java.sql.SQLException: Operation not allowed after
> > ResultSet closed
> > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> > > at com.mysql.jdbc.ResultSetImpl.checkClosed(ResultSetImpl.java:794)
> > > a

Re: Solr 4 to Solr7 migration DIH behavior change

2019-11-24 Thread Mikhail Khludnev
Hello, Shashank.
The error seems similar, but I didin't find an old issue with such error.
I've found only one abandoned thread in the mailing list. ='17' seems
suspicious to me, usually it should be done via prepared statement. Have no
thoughts, maybe you can share you data config?

On Sun, Nov 24, 2019 at 10:40 PM Shashank Bellary  wrote:

> Any thoughts guys? I tried with mysql driver v8 also, still no luck
>
> On 11/22/19, 3:00 PM, "Jörn Franke"  wrote:
>
> Note - This message originated from outside Care.com - Please use
> caution before opening attachments, clicking on links or sharing
> information.
>
>
> Did you update the java version to 8? Did you upgrade the MySQL driver
> to the latest version?
>
> > Am 22.11.2019 um 20:43 schrieb Shashank Bellary :
> >
> >
> >
> > Hi Folks
> > I migrated from Solr 4 to 7.5 and I see an issue with the way DIH is
> working. I use `JdbcDataSource` and here the config file is attached
> > 1) I started seeing OutOfMemory issue since MySQL JDBC driver has
> that issue of not respecting `batchSize` (though Solr4 didn't show this
> behavior). So, I added `batchSize=-1` for that
> > 2) After adding that I'm running into ResultSet closed exception as
> shown below while fetching the child entity
> >
> > getNext() failed for query ' SELECT REVIEW AS REVIEWS FROM
> SOLR_SITTER_SERVICE_PROFILE_REVIEWS WHERE SERVICE_PROFILE_ID = '17' ;
> ':org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.sql.SQLException: Operation not allowed after ResultSet closed
> > at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
> > at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:464)
> > at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
> > at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:133)
> > at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
> > at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:33)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
> > at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
> > at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
> > at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.sql.SQLException: Operation not allowed after
> ResultSet closed
> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> > at com.mysql.jdbc.ResultSetImpl.checkClosed(ResultSetImpl.java:794)
> > at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7145)
> > at
> com.mysql.jdbc.StatementImpl.getMoreResults(StatementImpl.java:2078)
> > at
> com.mysql.jdbc.StatementImpl.getMoreResults(StatementImpl.java:2062)
> > at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:458)
> > ... 13 more
> >
> > Is this a known issue? How do I fix this, any help is greatly
> appreciated.
> >
> > Thanks
> > Shashank
> > This email is intended for the person(s) to whom it is addressed and
> may contain information that is PRIVILEGED or CONFIDENTIAL. Any
> unauthorized use, distribution, copying, or disclosure by any person other
> than the addressee(s) is strictly prohibited. If you have received this
> email in error, please notify the sender immediately by return email and
> delete the message and any attachments from your system.
> > 
>
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by return email and delete the
> message and any attachments from your system.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Nested SubQuery

2019-11-22 Thread Mikhail Khludnev
Hello,
You may try to add =q,row.id and check logs


Re: Fetch parent and child document in solr 8.2

2019-11-19 Thread Mikhail Khludnev
There are some changes in nested docs, a kind of named scopes, or so. The
query you provided violated of one of the block join principles. There
should be caveat panel regarding this in the doc.

On Fri, Nov 15, 2019 at 9:46 PM Gajjar, Jigar  wrote:

> Hello,
>
>
>
> I am trying to fetch parent and child document together in one Solr query,
> I was able to do that in solr 7.4 but same query does not work in solr 8.2.
>
> Are there any major changes in the way that we are fetching children?
>
>
>
> My requirement is to fetch parent and children both in one call.
>
>
>
> I am trying
>
>
>
> http://localhost:8983/solr/demo/select?fl=*,[child]={!parent
> which="cat_s:sci-fi AND pubyear_i:1992"}
>
>
>
> what are the ways to retrieve parent child as nested documents?
>
>
>
>
>
>
>
> Thanks,
>
> Jigar Gajjar
>
> *OCLC* · Senior Software  Engineer
>
> 6565 Kilgour Place, Dublin, OH, USA, 43017
>
>  *M* +1-408-334-6379
>
> [image: OCLC] <http://www.oclc.org/home.en.html?cmpid=emailsig_logo>
>
> OCLC.org <http://www.oclc.org/home.en.html?cmpid=emailsig_link> *·* Blog
> <http://www.oclc.org/blog/main/?cmpid=emailsig_blog> *·* Facebook
> <http://www.facebook.com/pages/OCLC/20530435726> *·* Twitter
> <http://twitter.com/oclc> *·* YouTube <http://www.youtube.com/OCLCvideo>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: async BACKUP under Solr8.3

2019-11-18 Thread Mikhail Khludnev
Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP=snapshot1=col1=/tmp=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://host:port/solr/admin/cores
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://host:port/solr/admin/cores
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
> at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
> at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
> ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to solr-user@lucene.apache.org
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP=requestname under Solr8.3?
>
> Thanks
>


-- 
Sincerely yours
Mikhail Khludnev


Re: sort by score in join with geodist()

2019-11-14 Thread Mikhail Khludnev
Hm...
q={!join param=param.. v=$g}={!bool must='{!geofilt}'
should='{!func}geodist()'} ?

On Thu, Nov 14, 2019 at 10:11 PM Vasily Ogar  wrote:

> I tried and this way also, but here is another error.
> { "responseHeader":{ "status":400, "QTime":6, "params":{ "hl":"on", "pt":
> "54.6973867999,25.22481530046", "fl":"score,*,store:[subquery
> fromIndex=stores]", "store.rows":"1", "fq":"{!edismax qf=\"title
> description\" v=\"iphone xr 64gb\" mm=3<90%}", "hl.simple.pre":"", "
> store.q":"{!terms f=site_id v=$row.site_id}", "store.sfield":"coordinates",
> "hl.fl":"title description", "group.field":"site_id", "_":"1573559644298",
> "
> group":"true", "store.fq":"{!geofilt}", "d":"100", "group.limit":"2", "
> store.d":"100", "store.pt":"54.6973867999,25.22481530046",
> "store.fl
> ":"*,score", "sort":"score desc", "sfield":"coordinates", "q":"{!join
> from=site_id to=site_id fromIndex=stores score=max}+{!geofilt}
> {!func}geodist() ", "hl.simple.post":"", "hl.q":"{!edismax qf=$hl.fl
> v=\"iphone xr 64gb\"}", "hl.method":"unified", "debugQuery":"on"}},
> "error":{
> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.search.SyntaxError"],
> "msg":"org.apache.solr.search.SyntaxError:
> Expected ')' at position 8 in 'geodist('", "code":400}}
>
> On Thu, Nov 14, 2019 at 9:05 PM Mikhail Khludnev  wrote:
>
> > Space matters. Check my snippet once again please.
> >
> > On Thu, Nov 14, 2019 at 9:56 PM Vasily Ogar 
> wrote:
> >
> > > I tried today with plus but always got same error.
> > > { "responseHeader":{ "status":400, "QTime":2, "params":{ "hl":"on",
> "pt":
> > > "54.6973867999,25.22481530046", "fl":"score,*,store:[subquery
> > > fromIndex=stores]", "store.rows":"1", "fq":"{!edismax qf=\"title
> > > description\" v=\"iphone xr 64gb\" mm=3<90%}",
> "hl.simple.pre":"", "
> > > store.q":"{!terms f=site_id v=$row.site_id}",
> > "store.sfield":"coordinates",
> > > "hl.fl":"title description", "group.field":"site_id",
> > "_":"1573559644298",
> > > "
> > > group":"true", "store.fq":"{!geofilt}", "d":"100", "group.limit":"2", "
> > > store.d":"100", "store.pt":"54.6973867999,25.22481530046",
> > > "store.fl
> > > ":"*,score", "sort":"score desc", "sfield":"coordinates", "q":"{!join
> > > from=site_id to=site_id fromIndex=stores
> > > score=max}+{!geofilt}{!func}geodist()", "hl.simple.post":"",
> > > "hl.q":"{!edismax
> > > qf=$hl.fl v=\"iphone xr 64gb\"}", "hl.method":"unified",
> > > "debugQuery":"on"}},
> > > "error":{ "metadata":[
> > > "error-class","org.apache.solr.common.SolrException",
> > > "root-error-class","org.apache.solr.parser.ParseException"],
> > > "msg":"org.apache.solr.search.SyntaxError:
> > > Cannot parse '+{!geofilt}{!func}geodist()': Encountered \" \")\" \")
> \"\"
> > > at line 1, column 26.\nWas expecting one of:\n  \n  ...\n
> 
> > > ...\n  ...\n \"+\" ...\n \"-\" ...\n  ...\n \"(\" ...\n
> > > \"*\" ...\n \"^\" ...\n  ...\n  ...\n  ...\n
> > >  ...\n  ...\n \"[\" ...\n \"{\" ...\n 
> > ...\n
> > > \"filter(\" ...\n  ...\n ", "code":400}}
> > >
> > > On Thu, Nov 14, 2019 at 8:45 PM Mikhail Khludnev 
> > w

Re: sort by score in join with geodist()

2019-11-14 Thread Mikhail Khludnev
Space matters. Check my snippet once again please.

On Thu, Nov 14, 2019 at 9:56 PM Vasily Ogar  wrote:

> I tried today with plus but always got same error.
> { "responseHeader":{ "status":400, "QTime":2, "params":{ "hl":"on", "pt":
> "54.6973867999,25.22481530046", "fl":"score,*,store:[subquery
> fromIndex=stores]", "store.rows":"1", "fq":"{!edismax qf=\"title
> description\" v=\"iphone xr 64gb\" mm=3<90%}", "hl.simple.pre":"", "
> store.q":"{!terms f=site_id v=$row.site_id}", "store.sfield":"coordinates",
> "hl.fl":"title description", "group.field":"site_id", "_":"1573559644298",
> "
> group":"true", "store.fq":"{!geofilt}", "d":"100", "group.limit":"2", "
> store.d":"100", "store.pt":"54.6973867999,25.22481530046",
> "store.fl
> ":"*,score", "sort":"score desc", "sfield":"coordinates", "q":"{!join
> from=site_id to=site_id fromIndex=stores
> score=max}+{!geofilt}{!func}geodist()", "hl.simple.post":"",
> "hl.q":"{!edismax
> qf=$hl.fl v=\"iphone xr 64gb\"}", "hl.method":"unified",
> "debugQuery":"on"}},
> "error":{ "metadata":[
> "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.parser.ParseException"],
> "msg":"org.apache.solr.search.SyntaxError:
> Cannot parse '+{!geofilt}{!func}geodist()': Encountered \" \")\" \") \"\"
> at line 1, column 26.\nWas expecting one of:\n  \n  ...\n 
> ...\n  ...\n \"+\" ...\n \"-\" ...\n  ...\n \"(\" ...\n
> \"*\" ...\n \"^\" ...\n  ...\n  ...\n  ...\n
>  ...\n  ...\n \"[\" ...\n \"{\" ...\n  ...\n
> \"filter(\" ...\n  ...\n ", "code":400}}
>
> On Thu, Nov 14, 2019 at 8:45 PM Mikhail Khludnev  wrote:
>
> > It should be like
> > "q":"{!join from=site_id to=site_id fromIndex=stores
> > score=max}+{!geofilt} {!func}geodist() ",
> > post debugQuery
> >
> > On Thu, Nov 14, 2019 at 4:44 PM Vasily Ogar 
> wrote:
> >
> > > I was glad too early because it can only sort or only filter, but can't
> > do
> > > together :(. It takes the only first argument, in my case it geodist or
> > > geofilt
> > >
> > > On Thu, Nov 14, 2019 at 11:11 AM Vasily Ogar 
> > > wrote:
> > >
> > > > Hello,
> > > > I fixed it. If I need to sort by price:
> > > > "q":"{!join from=site_id to=site_id fromIndex=stores}*:*",
> > > > "fq":"{!edismax qf=\"title description\" v=\"iphone xr 64gb\"
> > mm=3<90%}",
> > > > "fl":"score,*,store:[subquery fromIndex=stores]",
> > > > "store.q":"{!terms f=site_id v=$row.site_id}", "store.rows":"1",
> > > "store.fl
> > > > ":"*,score", "sort":"price_low desc",
> > > > "hl":"on", "hl.simple.pre":"", "hl.simple.post":"",
> > > > "hl.q":"{!edismax qf=$hl.fl v=\"iphone xr 64gb\"}", "hl.fl":"title
> > > > description", "hl.method":"unified", "group.field":"site_id",
> "group":
> > > > "true"
> > > > "group.limit":"2",
> > > >
> > > > And if I need to sort by geodist:
> > > > "q":"{!join from=site_id to=site_id fromIndex=stores
> > > > score=max}{!func}geodist(){!geofilt}",
> > > > "d":"100",
> > > > "sfield":"coordinates",
> > > > "pt":"54.6973867999,25.22481530046"
> > > > "fq":"{!edismax qf=\"title description\" v=\"iphone xr 64gb\"
> > mm=3<90%}",
> > > > "fl":"score,*,store:[subquery fromIndex=stores]",
> > > > "store.q":"{!terms f=site_id v=

Re: sort by score in join with geodist()

2019-11-14 Thread Mikhail Khludnev
It should be like
"q":"{!join from=site_id to=site_id fromIndex=stores
score=max}+{!geofilt} {!func}geodist() ",
post debugQuery

On Thu, Nov 14, 2019 at 4:44 PM Vasily Ogar  wrote:

> I was glad too early because it can only sort or only filter, but can't do
> together :(. It takes the only first argument, in my case it geodist or
> geofilt
>
> On Thu, Nov 14, 2019 at 11:11 AM Vasily Ogar 
> wrote:
>
> > Hello,
> > I fixed it. If I need to sort by price:
> > "q":"{!join from=site_id to=site_id fromIndex=stores}*:*",
> > "fq":"{!edismax qf=\"title description\" v=\"iphone xr 64gb\" mm=3<90%}",
> > "fl":"score,*,store:[subquery fromIndex=stores]",
> > "store.q":"{!terms f=site_id v=$row.site_id}", "store.rows":"1",
> "store.fl
> > ":"*,score", "sort":"price_low desc",
> > "hl":"on", "hl.simple.pre":"", "hl.simple.post":"",
> > "hl.q":"{!edismax qf=$hl.fl v=\"iphone xr 64gb\"}", "hl.fl":"title
> > description", "hl.method":"unified", "group.field":"site_id", "group":
> > "true"
> > "group.limit":"2",
> >
> > And if I need to sort by geodist:
> > "q":"{!join from=site_id to=site_id fromIndex=stores
> > score=max}{!func}geodist(){!geofilt}",
> > "d":"100",
> > "sfield":"coordinates",
> > "pt":"54.6973867999,25.22481530046"
> > "fq":"{!edismax qf=\"title description\" v=\"iphone xr 64gb\" mm=3<90%}",
> > "fl":"score,*,store:[subquery fromIndex=stores]",
> > "store.q":"{!terms f=site_id v=$row.site_id}", "store.rows":"1",
> "store.fl
> > ":"*,score",
> > "store.fq":"{!geofilt}",
> > "store.sfield":"coordinates",
> > "store.d":"100", "store.pt":"54.6973867999,25.22481530046",
> "sort
> > ":"score desc",
> > "hl":"on", "hl.simple.pre":"", "hl.simple.post":"",
> > "hl.q":"{!edismax qf=$hl.fl v=\"iphone xr 64gb\"}", "hl.fl":"title
> > description", "hl.method":"unified", "group.field":"site_id", "group":
> > "true"
> > "group.limit":"2",
> >
> > On Tue, Nov 12, 2019 at 6:54 PM Vasily Ogar 
> wrote:
> >
> >> Thank you for advice, now it working as expected. Maybe you know how to
> >> integrate with dismax?
> >>
> >> On Tue, Nov 12, 2019 at 6:10 PM Mikhail Khludnev 
> wrote:
> >>
> >>> tlrd;
> >>> I noticed func under fq that make no sense. Only q or sort yield
> scores.
> >>>
> >>> On Tue, Nov 12, 2019 at 6:43 PM Vasily Ogar 
> >>> wrote:
> >>>
> >>> > First of all, thank you for your help.
> >>> > Now it doesn't show any errors, but somehow score is based on the
> >>> title and
> >>> > description but not on the geodist.
> >>> > "params":{ "hl":"on", "pt":"54.6973867999,25.22481530046",
> >>> > "fl":"score,*,store:[subquery
> >>> > fromIndex=stores]", "store.rows":"1", "fq":"{!join from=site_id
> >>> to=site_id
> >>> > fromIndex=stores score=max}{!func}geodist()", "store.sort":"geodist()
> >>> asc",
> >>> > "hl.simple.pre":"", "store.q":"{!terms f=site_id
> >>> v=$row.site_id}", "
> >>> > store.sfield":"coordinates", "hl.fl":"title description",
> >>> "group.field":
> >>> > "site_id", "_":"1573559644298", "group":"true",
> >>> "store.fq":"{!geofilt}", "d
> >>> > ":"100", "{!geofilt}":"", "group.limit":"2", "store.d":"100", "
> >>> store.pt":
> >>> > &

Re: sort by score in join with geodist()

2019-11-12 Thread Mikhail Khludnev
of documents with field\n 0.5782502 = tf,
> computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> average length of field\n", "product:
> https://istore.lt/iphone-xr-64gb-coral.html":"\n3.9714882 =
> weight(title:\"iphon xr 64gb\" in 29) [SchemaSimilarity], result of:\n
> 3.9714882 = score(freq=1.0), product of:\n 6.8681135 = idf, sum of:\n
> 1.1837479 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 459
> = n, number of documents containing term\n 1500 = N, total number of
> documents with field\n 3.3156862 = idf, computed as log(1 + (N - n + 0.5) /
> (n + 0.5)) from:\n 54 = n, number of documents containing term\n 1500 = N,
> total number of documents with field\n 2.3686793 = idf, computed as log(1 +
> (N - n + 0.5) / (n + 0.5)) from:\n 140 = n, number of documents containing
> term\n 1500 = N, total number of documents with field\n 0.5782502 = tf,
> computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n 1.0 =
> phraseFreq=1.0\n 1.2 = k1, term saturation parameter\n 0.75 = b, length
> normalization parameter\n 4.0 = dl, length of field\n 8.384666 = avgdl,
> average length of field\n"}, "QParser":"LuceneQParser",
> "filter_queries":["{!join
> from=site_id to=site_id fromIndex=stores score=max}{!func}geodist()"], "
> parsed_filter_queries":["OtherCoreJoinQuery(OtherCoreJoinQuery
> [fromIndex=stores, fromCoreOpenTime=3373417389901342 extends
> SameCoreJoinQuery
>
> [fromQuery=ShapeFieldCacheDistanceValueSource(org.apache.lucene.spatial.prefix.PointPrefixTreeFieldCacheProvider@2bfc8ad8
> ,
> Pt(x=25.22481530046,y=54.6973867999)), fromField=site_id,
> toField=site_id, scoreMode=Max]])"], }
>
> On Tue, Nov 12, 2019 at 12:48 PM Mikhail Khludnev  wrote:
>
> > Hello,
> > It seems like I breached the limit on unconscious replies in mailing list
> >   I'd rather start with this:
> > q={!join from=site_id to=site_id fromIndex=stores
> > score=max}+{!geofilt}
> >
> >
> {!func}geodist()=coordinates=54.6973867999,25.22481530046=10
> >
> >
> > On Mon, Nov 11, 2019 at 11:11 PM Mikhail Khludnev 
> wrote:
> >
> > > Is it something like  https://issues.apache.org/jira/browse/SOLR-10673
> ?
> > >
> > > On Mon, Nov 11, 2019 at 3:47 PM Vasily Ogar 
> > wrote:
> > >
> > >> it's show nothing because I got an error
> > >> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> > >> "root-error-class","org.apache.solr.search.SyntaxError"],
> > >> "msg":"org.apache.solr.search.SyntaxError:
> > >> geodist - not enough parameters:[]",
> > >>
> > >> If I set parameters then I got another error
> > >> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> > >> "root-error-class","org.apache.solr.common.SolrException"], "msg":"A
> > >> ValueSource isn't directly available from this field. Instead try a
> > query
> > >> using the distance as the score.",
> > >>
> > >> On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev 
> > wrote:
> > >>
> > >> > Hello, Vasily.
> > >> > Why not? What have you got in debugQuery=true?
> > >> >
> > >> > On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar 
> > >> wrote:
> > >> >
> > >> > > Hello,
> > >> > > Is it possible to sort by score in join by geodist()? For
> instance,
> > >> > > something like this
> > >> > > q={!join from=site_id to=site_id fromIndex=stores score=max}
> > >> > > +{!func}gedist() +{!geofilt sfield=coordinates
> > >> > > pt=54.6973867999,25.22481530046 d=10}
> > >> > > sort=score desc
> > >> > > Thank you
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Sincerely yours
> > >> > Mikhail Khludnev
> > >> >
> > >>
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: sort by score in join with geodist()

2019-11-12 Thread Mikhail Khludnev
Hello,
It seems like I breached the limit on unconscious replies in mailing list
  I'd rather start with this:
q={!join from=site_id to=site_id fromIndex=stores
score=max}+{!geofilt}
{!func}geodist()=coordinates=54.6973867999,25.22481530046=10


On Mon, Nov 11, 2019 at 11:11 PM Mikhail Khludnev  wrote:

> Is it something like  https://issues.apache.org/jira/browse/SOLR-10673 ?
>
> On Mon, Nov 11, 2019 at 3:47 PM Vasily Ogar  wrote:
>
>> it's show nothing because I got an error
>> "metadata":[ "error-class","org.apache.solr.common.SolrException",
>> "root-error-class","org.apache.solr.search.SyntaxError"],
>> "msg":"org.apache.solr.search.SyntaxError:
>> geodist - not enough parameters:[]",
>>
>> If I set parameters then I got another error
>> "metadata":[ "error-class","org.apache.solr.common.SolrException",
>> "root-error-class","org.apache.solr.common.SolrException"], "msg":"A
>> ValueSource isn't directly available from this field. Instead try a query
>> using the distance as the score.",
>>
>> On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev  wrote:
>>
>> > Hello, Vasily.
>> > Why not? What have you got in debugQuery=true?
>> >
>> > On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar 
>> wrote:
>> >
>> > > Hello,
>> > > Is it possible to sort by score in join by geodist()? For instance,
>> > > something like this
>> > > q={!join from=site_id to=site_id fromIndex=stores score=max}
>> > > +{!func}gedist() +{!geofilt sfield=coordinates
>> > > pt=54.6973867999,25.22481530046 d=10}
>> > > sort=score desc
>> > > Thank you
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: sort by score in join with geodist()

2019-11-11 Thread Mikhail Khludnev
Is it something like  https://issues.apache.org/jira/browse/SOLR-10673 ?

On Mon, Nov 11, 2019 at 3:47 PM Vasily Ogar  wrote:

> it's show nothing because I got an error
> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.search.SyntaxError"],
> "msg":"org.apache.solr.search.SyntaxError:
> geodist - not enough parameters:[]",
>
> If I set parameters then I got another error
> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.common.SolrException"], "msg":"A
> ValueSource isn't directly available from this field. Instead try a query
> using the distance as the score.",
>
> On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev  wrote:
>
> > Hello, Vasily.
> > Why not? What have you got in debugQuery=true?
> >
> > On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar 
> wrote:
> >
> > > Hello,
> > > Is it possible to sort by score in join by geodist()? For instance,
> > > something like this
> > > q={!join from=site_id to=site_id fromIndex=stores score=max}
> > > +{!func}gedist() +{!geofilt sfield=coordinates
> > > pt=54.6973867999,25.22481530046 d=10}
> > > sort=score desc
> > > Thank you
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: sort by score in join with geodist()

2019-11-11 Thread Mikhail Khludnev
Hello, Vasily.
Why not? What have you got in debugQuery=true?

On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar  wrote:

> Hello,
> Is it possible to sort by score in join by geodist()? For instance,
> something like this
> q={!join from=site_id to=site_id fromIndex=stores score=max}
> +{!func}gedist() +{!geofilt sfield=coordinates
> pt=54.6973867999,25.22481530046 d=10}
> sort=score desc
> Thank you
>


-- 
Sincerely yours
Mikhail Khludnev


Re: subquery highlight

2019-11-10 Thread Mikhail Khludnev
Oh.. gosh. Sure. Subquery yields doc results only, neither of facets,
highlighting is attached to response.

On Mon, Nov 11, 2019 at 10:07 AM Vasily Ogar  wrote:

> My subquery is products I tried product.hl=on=products.title
> products.description and like this product.hl=on=title
> description and like this hl=on=title description and
> hl.products=on=title description.
> I don't know what else
>
>
> On Mon, Nov 11, 2019 at 8:25 AM Mikhail Khludnev  wrote:
>
> > Hello,
> > Have you tried to pefix hl.* params with particular subquery name?
> >
> > On Sun, Nov 10, 2019 at 11:46 PM Vasily Ogar 
> > wrote:
> >
> > > Hello,
> > > I am using Solr 8.2 and can't find out how to use highlight in the
> > > subquery. Is it possible at all?
> > > Thank you
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: subquery highlight

2019-11-10 Thread Mikhail Khludnev
Hello,
Have you tried to pefix hl.* params with particular subquery name?

On Sun, Nov 10, 2019 at 11:46 PM Vasily Ogar  wrote:

> Hello,
> I am using Solr 8.2 and can't find out how to use highlight in the
> subquery. Is it possible at all?
> Thank you
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Mikhail Khludnev
Hello,

Have you build suggester before requesting?

On Wed, Nov 6, 2019 at 12:50 PM Tyrone Tse  wrote:

> Solr version 8.1.1
>
> My schema
>
>  multiValued="false" indexed="true"/>
> 
>
> solconfig.xml
>
> 
> 
> mySuggester
> FreeTextLookupFactory
> DocumentDictionaryFactory
> suggest
> 3
>  
>  name="suggestFreeTextAnalyzerFieldType">text_en_splitting
> false
> 
> 
>
>  startup="lazy" >
> 
> true
> 10
> 
> 
> suggest
> 
> 
>
> The suggest query
>
> http://localhost:8983/solr/catalog/suggest?suggest=true=mySuggester=gin
>
> works on Red Hat Enterprise Linux 7.6
>
> it returns
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "suggest":{"mySuggester":{
>   "gin":{
> "numFound":10,
> "suggestions":[{
> "term":"gin",
> "weight":13613207305387128,
> "payload":""},
>   {
> "term":"ginjo",
> "weight":3986422076966947,
> "payload":""},
> ...
>
> But when I  on my Mac with OS High Sierra
> Generates the error
>
> "Lookup not supported at this time"
>
> "java.lang.IllegalStateException: Lookup not supported at this time\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
>
> org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
>
> org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
>


-- 
Sincerely yours
Mikhail Khludnev


Re: ConcurrentModificationException in SolrInputDocument writeMap

2019-11-06 Thread Mikhail Khludnev
Hello, Tim.
Please confirm my understanding. Does exception happens in standalone Java
ingesting app?
If, it's so, Does it reuse either SolrInputDocument instances of
fields/values collections between update calls?

On Wed, Nov 6, 2019 at 8:00 AM Tim Swetland  wrote:

> Nevermind my comment on not having this problem in 8.1. We do have it there
> as well, I just didn't look far enough back in our logs on my initial
> search. Would still appreciate whatever thoughts anyone might have on the
> exception.
>
> On Wed, Nov 6, 2019 at 10:17 AM Tim Swetland  wrote:
>
> > I'm currently running into a ConcurrentModificationException ingesting
> > data as we attempt to upgrade from Solr 8.1 to 8.2. It's not every
> > document, but it definitely appears regularly in our logs. We didn't run
> > into this problem in 8.1, so I'm not sure what might have changed. I feel
> > like this is probably a bug, but if there's a workaround or if there's an
> > idea of something I might be doing wrong, please let me know.
> >
> > Stack trace:
> > o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> > SolrCmdDistributor$Req: cmd=add{_version=,id=};
> node=StdNode:
> > https:///solr/coll_shard1_replica_n2/ to https://
> /solr/coll_shard1_replica_n2/
> > => java.util.ConcurrentModificationException
> > at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
> > java.util.ConcurrentModificationException: null
> >   at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
> >   at
> >
> org.apache.solr.common.SolrInputDocument.writeMap(SolrInputDocument.java:51)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:658)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:813)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:411)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:750)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:395)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:248)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> > org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:167)
> >   at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
> >   at
> >
> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
> >   at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:338)
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
> >
> >   at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> >   at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil
> > .java:209)
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >
> >   at java.lang.Thread.run(Thread.java:748)
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Mikhail Khludnev
Hello, Audrey.

Can you create a regexp capturing all-caps for
https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#pattern-replace-filter
 ?

On Wed, Nov 6, 2019 at 6:36 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> I would also love to know what filter to use to ignore capitalized
> acronyms... which one can do this OOTB?
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
> On 11/6/19, 3:54 AM, "Paras Lehana"  wrote:
>
> Hi Community,
>
> In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F3_understanding-2Danalyzers-2Dtokenizers-2Dand-2Dfilters.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=yEGsn7-9_UxyVA_itjyjmvW4UAAO1WE_p0rDKTnULaE=dmVDu9CjG_4iJDG59qtuPB4vaj8769FPo7NwGyVPc9g=
> >
> section, the text talks about precision and recall depending on how
> you use
> analyzers during query and index time:
>
> For indexing, you often want to simplify, or normalize, words. For
> example,
> > setting all letters to lowercase, eliminating punctuation and
> accents,
> > mapping words to their stems, and so on. Doing so can *increase
> recall *because,
> > for example, "ram", "Ram" and "RAM" would all match a query for
> "ram". To *increase
> > query-time precision*, a filter could be employed to narrow the
> matches
> > by, for example, *ignoring all-cap acronyms* if you’re interested in
> male
> > sheep, but not Random Access Memory.
>
>
> In first case (about Recall), is it assumed that "ram" should match to
> all
> three? *[Q1] *Because, to increase recall, we have to decrease false
> negatives (documents not retrieved but are relevant). In other case
> (if the
> three are not intended to match the query), precision is actually
> decreased
> here (false positives are increased).
>
> This makes sense for the second case, where precision should increase
> as we
> are decreasing false positives (documents marked relevant wrongly).
>
> However, the text talks about the method of "employing a filter that
> ignores all-cap acronyms". How are we supposed to do that on query
> time?
> *[Q2]* Weren't we supposed to remove filter (LCF) during the index
> time?
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
>     8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: DIH across two SQL DBs

2019-10-31 Thread Mikhail Khludnev
Hello, Jan.

Have you considered join="zipper" ?

On Thu, Oct 31, 2019 at 12:52 AM Jan Høydahl  wrote:

> I need a SELECT which filters IDS based on an ‘id’ list coming from
> another database, i.e. SELECT * FROM maindb.maintable WHERE id IN (SELECT
> myid FROM otherdb.other_table).
>
> The docs are fetched from a MySql DB while the list of IDs to includ in
> that first SELECT WHERE statement is fetched from a view in a PgSql DB, so
> you cannot simply include the table name in the WHERE clause. I have added
> two dataSources, and I think I’ll need an  which caches the ID list
> from ‘otherdb’ in memory and then somehow references that cached list in
> place of the inner select?
>
> However since the list of IDs are UUID strings and there are a few
> thousand of them, I guess the SELECT becomes too large if you just send a
> huge OR clause to MySql. I have been thinking about a 2-stage solution,
> first create a temp table in MySql and INSERT all the IDs there, then
> include the temp table in the WHERE as usual, and delete the tmp table
> afterwards. Does DIH have a built-in and efficient feature for such an
> operation?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Reg: Solr Segment level cache

2019-10-30 Thread Mikhail Khludnev
Hello,
Which particular cache you are talking about?

On Wed, Oct 30, 2019 at 12:19 AM lawrence antony  wrote:

> Dear Sir
>
> Do Solr support segment level cache, so that if only a single segment
> changed then only a small portion of the cached data needs to be refreshed.
>
> --
> with thanks and regards,
> lawrence antony.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr-Cloud, join and collection collocation

2019-10-16 Thread Mikhail Khludnev
Note: adding score=none as a local param. Turns another algorithm dragging
by from side join.

On Wed, Oct 16, 2019 at 11:37 AM Nicolas Paris 
wrote:

> Sadly, the join performances are poor.
> The joined collection is 12M documents, and the performances are 6k ms
> versus 60ms when I compare to the denormalized field.
>
> Apparently, the performances does not change when the filter on the
> joined collection is changed. It is still 6k ms when the subset is 12M
> or 1 document in size. So the performance of join looks correlated to
> size of joined collection and not the kind of filter applied to it.
>
> I will explore the streaming expressions
>
> On Wed, Oct 16, 2019 at 08:00:43AM +0200, Nicolas Paris wrote:
> > > You can certainly replicate the joined collection to every shard. It
> > > must fit in one shard and a replica of that shard must be co-located
> > > with every replica of the “to” collection.
> >
> > Yes, I found this in the documentation, with a clear example just after
> > this mail. I will test it today. I also read your blog about join
> > performances[1] and I suspect the performance impact of joins will be
> > huge because the joined collection is about 10M documents (only two
> > fields, unique id and an array of longs and a filter applied to the
> > array, join key is 10M unique IDs).
> >
> > > Have you looked at streaming and “streaming expressions"? It does not
> > > have the same problem, although it does have its own limitations.
> >
> > I never tested them, and I am not very confortable yet in how to test
> > them. Is it possible to mix query parsers and streaming expression in
> > the client call via http parameters - or is streaming expression apply
> > programmatically only ?
> >
> > [1] https://lucidworks.com/post/solr-and-joins/
> >
> > On Tue, Oct 15, 2019 at 07:12:25PM -0400, Erick Erickson wrote:
> > > You can certainly replicate the joined collection to every shard. It
> must fit in one shard and a replica of that shard must be co-located with
> every replica of the “to” collection.
> > >
> > > Have you looked at streaming and “streaming expressions"? It does not
> have the same problem, although it does have its own limitations.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Oct 15, 2019, at 6:58 PM, Nicolas Paris 
> wrote:
> > > >
> > > > Hi
> > > >
> > > > I have several large collections that cannot fit in a standalone solr
> > > > instance. They are split over multiple shards in solr-cloud mode.
> > > >
> > > > Those collections are supposed to be joined to an other collection to
> > > > retrieve subset. Because I am using distributed collections, I am not
> > > > able to use the solr join feature.
> > > >
> > > > For this reason, I denormalize the information by adding the joined
> > > > collection within every collections. Naturally, when I want to update
> > > > the joined collection, I have to update every one of the distributed
> > > > collections.
> > > >
> > > > In standalone mode, I only would have to update the joined
> collection.
> > > >
> > > > I wonder if there is a way to overcome this limitation. For example,
> by
> > > > replicating the joined collection to every shard - or other method I
> am
> > > > ignoring.
> > > >
> > > > Any thought ?
> > > > --
> > > > nicolas
> > >
> >
> > --
> > nicolas
> >
>
> --
> nicolas
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Atomic Updates with PreAnalyzedField

2019-10-16 Thread Mikhail Khludnev
Hello, Oleksandr.
It deserves JIRA, please raise one.

On Tue, Oct 15, 2019 at 8:17 PM Oleksandr Drapushko 
wrote:

> Hello Community,
>
> I've discovered data loss bug and couldn't find any mention of it. Please
> confirm this bug haven't been reported yet.
>
>
> Description:
>
> If you try to update non pre-analyzed fields in a document using atomic
> updates, data in pre-analyzed fields (if there is any) will be lost. The
> bug was discovered in Solr 8.2 and 7.7.2.
>
>
> Steps to reproduce:
>
> 1. Index this document into techproducts
> {
>   "id": "a",
>   "n_s": "s1",
>   "pre":
>
> "{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
> }
>
> 2. Query the document
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"a",
> "n_s":"s1",
> "pre":"Alaska",
> "_version_":1647475215142223872}]
>   }}
>
> 3. Update using atomic syntax
> {
>   "add": {
> "doc": {
>   "id": "a",
>   "n_s": {"set": "s2"}
> }
>   }
> }
>
> 4. Observe the warning in solr log
> UI:
> WARN  x:techproducts_shard2_replica_n6  PreAnalyzedField  Error parsing
> pre-analyzed field 'pre'
>
> solr.log:
> WARN  (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8
> x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing
> pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type
> java.lang.String, expected Map
> at
>
> org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)
>
> 5. Query the document again
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"a",
> "n_s":"s2",
>     "_version_":1647475461695995904}]
>   }}
>
> Result: There is no 'pre' field in the document anymore.
>
>
> My thoughts on it:
>
> 1. Data loss can be prevented if the warning will be replaced with error
> (re-throwing exception). Atomic updates for such documents still won't
> work, but updates will be explicitly rejected.
>
> 2. Solr tries to read the document from index, merge it with input document
> and re-index the document, but when it reads indexed pre-analyzed fields
> the format is different, so Solr cannot parse and re-index those fields
> properly.
>
>
> Thank you,
> Oleksandr
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Wild-card query behavior

2019-10-09 Thread Mikhail Khludnev
Well it remind regular awkward parsing issues. Try to experiment with
={!join to=...from=... v='field:12*'} or ={!join to=... from=...
v=$qq}=field:12*
No more questions to ask.

On Wed, Oct 9, 2019 at 4:39 PM Paresh  wrote:

> E.g. In query, join with wild-card query using parenthesis I get error -
>
> "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.parser.ParseException"],
> "msg":"org.apache.solr.search.SyntaxError: Cannot parse
> 'solrField:(12*': Encountered \"\" at line 1, column 57.\r\nWas
> expecting one of:\r\n ...\r\n ...\r\n ...\r\n
> \"+\" ...\r\n\"-\" ...\r\n ...\r\n\"(\" ...\r\n
> \")\" ...\r\n\"*\" ...\r\n\"^\" ...\r\n ...\r\n
>  ...\r\n ...\r\n ...\r\n
> 
> ...\r\n ...\r\n\"[\" ...\r\n\"{\" ...\r\n
>  ...\r\n\"filter(\" ...\r\n ...\r\n",
> "code":400}}
>
> When using the same query with parenthesis in filter query (fq), I get the
> expected results.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev
I might not fully understand how you would like to combine them. The
possible reason is that [subquery] expect regular Solr Response to act on,
but [child] might yield something hairish.

On Wed, Oct 9, 2019 at 2:40 PM Bram Biesbrouck <
bram.biesbro...@reinvention.be> wrote:

> Hi Mikhail,
>
> You're right, I should file an issue for the doc thing, I'll look into it.
>
> Thanks for pointing me towards parsing the _nest_path_ field. It's exactly
> what ChildDocTransformer does, indeed.
>
> Would you by any chance know why [child] and [subquery] can't be combined?
> They don't look too related to me and I can't seem to find any logical
> reason why they couldn't coexist in the same query.
>
> b.
>
>
> On Wed, Oct 9, 2019 at 1:08 PM Mikhail Khludnev  wrote:
>
> > Hello, Bram.
> >
> > I guess [child] was recently extended. Docs might be outdated, don't
> > hesitate to contribute doc improvement.
> > [subquery] is a neat thing, it's just queries without relying on
> particular
> > use case, if my understanding is right one may request something like
> > _path_ field in [subquery], which may let to reconstruct hierarchy.
> >
> > On Wed, Oct 9, 2019 at 1:36 PM Bram Biesbrouck <
> > bram.biesbro...@reinvention.be> wrote:
> >
> > > Hi all,
> > >
> > > I'm diving deep into the ChildDocTransformer and its
> > > related SubQueryAugmenter.
> > >
> > > First of all, I think there's a bug in the Solr docs about [child]. It
> > > states:
> > > "This transformer returns all descendant documents of each parent
> > document
> > > matching your query in a flat list nested inside the matching parent
> > > document."
> > > This is not exact: the descendant documents are "wired into" the
> parent,
> > > creating a hierarchical structure (which is nice). Or am I
> > misinterpreting
> > > the docs?
> > >
> > > Secondly, the [subquery] transformer is super powerful and awesome, but
> > it
> > > doesn't like to be combined with [child]? I'm getting a "[subquery]
> name
> > > children is duplicated" error. Is there a way to work around this? Or
> > maybe
> > > better: is there a way to make the [subquery] transformer behave like
> (a
> > > more flexible version of) [child]? Because now, the path information
> (how
> > > the children relate to their parent fields) is lost when using
> > [subquery].
> > >
> > > Hope to hear more!
> > >
> > > b.
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev
Hello, Bram.

I guess [child] was recently extended. Docs might be outdated, don't
hesitate to contribute doc improvement.
[subquery] is a neat thing, it's just queries without relying on particular
use case, if my understanding is right one may request something like
_path_ field in [subquery], which may let to reconstruct hierarchy.

On Wed, Oct 9, 2019 at 1:36 PM Bram Biesbrouck <
bram.biesbro...@reinvention.be> wrote:

> Hi all,
>
> I'm diving deep into the ChildDocTransformer and its
> related SubQueryAugmenter.
>
> First of all, I think there's a bug in the Solr docs about [child]. It
> states:
> "This transformer returns all descendant documents of each parent document
> matching your query in a flat list nested inside the matching parent
> document."
> This is not exact: the descendant documents are "wired into" the parent,
> creating a hierarchical structure (which is nice). Or am I misinterpreting
> the docs?
>
> Secondly, the [subquery] transformer is super powerful and awesome, but it
> doesn't like to be combined with [child]? I'm getting a "[subquery] name
> children is duplicated" error. Is there a way to work around this? Or maybe
> better: is there a way to make the [subquery] transformer behave like (a
> more flexible version of) [child]? Because now, the path information (how
> the children relate to their parent fields) is lost when using [subquery].
>
> Hope to hear more!
>
> b.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Wild-card query behavior

2019-10-09 Thread Mikhail Khludnev
Hello, Paresh.
Please examine debugQuery output, otherwise 'doesn't work' is vague.

On Wed, Oct 9, 2019 at 8:31 AM Paresh  wrote:

> Hi All,
>
> I am trying wild-card query with query, filter query with and without !join
> and finding it difficult to understand the SOLR behavior.
>
> (-) wild-card like 12* in query: field:12* works well
> (-) wild-card like 12* in query with {!join to=... from=...}field:12* -->
> works well
> (-) wild-card like (12*) in query with {!join to=... from=...}field:(12*)
> --> doesn't work
> (-) wild-card like (12*) in filter query with ={!join to=...
> from=...}field:12* --> doesn't work
> (-) wild-card like (12*) in filter query with ={!join to=...
> from=...}field:"12*" --> doesn't work
> (-) wild-card like (12*) in filter query with ={!join to=...
> from=...}field:(12*) --> works well
>
> Why wild-card query does not work with {!join}?
>
> Regards,
> Paresh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to block expensive solr queries

2019-10-08 Thread Mikhail Khludnev
It's worth to raise an issue for supporting timeAllowed for stats. Until
it's done, something like jetty filter is only an option,

On Tue, Oct 8, 2019 at 12:34 AM Wei  wrote:

> Hi Mikhail,
>
> Yes I have the timeAllowed parameter configured, still is this case it
> doesn't seem to prevent the stats request from blocking other normal
> queries.  Is it possible to drop the request before solr executes it? maybe
> at the jetty request filter?
>
> Thanks,
> Wei
>
> On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev  wrote:
>
> > Hello, Wei.
> >
> > Have you tried to abandon heavy queries with
> >
> >
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> >  ?
> > It may or may not be able to stop stats.
> >
> >
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> > can clarify it.
> >
> > On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:
> >
> > > Hi,
> > >
> > > Recently we encountered a problem when solr cloud query latency
> suddenly
> > > increase, many simple queries that has small recall gets time out.
> After
> > > digging a bit I found that the root cause is some stats queries happen
> at
> > > the same time, such as
> > >
> > >
> > >
> >
> /solr/mycollection/select?stats=true=unique_ids=true
> > >
> > >
> > >
> > > I see unique_ids is a high cardinality field so this query is quite
> > > expensive. But why a small volume of such query blocks other queries
> and
> > > make simple queries time out?  I checked the solr thread pool and see
> > there
> > > are plenty of idle threads available.  We are using solr 7.6.2 with a
> 10
> > > shard cloud set up.
> > >
> > > Is there a way to block certain solr queries based on url pattern? i.e.
> > > ignore the stats.calcdistinct request in this case.
> > >
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to block expensive solr queries

2019-10-07 Thread Mikhail Khludnev
Hello, Wei.

Have you tried to abandon heavy queries with
https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
 ?
It may or may not be able to stop stats.
https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
can clarify it.

On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:

> Hi,
>
> Recently we encountered a problem when solr cloud query latency suddenly
> increase, many simple queries that has small recall gets time out. After
> digging a bit I found that the root cause is some stats queries happen at
> the same time, such as
>
>
> /solr/mycollection/select?stats=true=unique_ids=true
>
>
>
> I see unique_ids is a high cardinality field so this query is quite
> expensive. But why a small volume of such query blocks other queries and
> make simple queries time out?  I checked the solr thread pool and see there
> are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> shard cloud set up.
>
> Is there a way to block certain solr queries based on url pattern? i.e.
> ignore the stats.calcdistinct request in this case.
>
>
> Thanks,
>
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev


Re: json.facet throws ClassCastException

2019-10-07 Thread Mikhail Khludnev
Note. It seems like it's addressed already
https://issues.apache.org/jira/browse/SOLR-12330
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=bf69a40#patch2



On Sat, Oct 5, 2019 at 10:43 AM Andrea Gazzarini 
wrote:

> Hi, problem should be caused by missing surrounding curly brackets.
> That is, your query is
>
> json.facet=prod:{type:terms,field:product,mincount:1,limit:8}
>
> instead it should be
>
> json.facet=*{*prod:{type:terms,field:product,mincount:1,limit:8}*}*
>
> that causes the wrong interpretation of the "json/facet" parameter
> (String instead of Map)
>
> Cheers,
> Andrea
>
> On 04/10/2019 22:55, Mikhail Khludnev wrote:
> > Gosh, obviously. see the clue
> >
> https://github.com/apache/lucene-solr/blob/7d3dcd220f92f25a997cf1559a91b6d9e1b57c6d/solr/core/src/java/org/apache/solr/search/facet/FacetModule.java#L78
> >
> > On Fri, Oct 4, 2019 at 10:47 PM Webster Homer <
> > webster.ho...@milliporesigma.com> wrote:
> >
> >> Sometimes it comes back in the reply
> >> "java.lang.ClassCastException: java.lang.String cannot be cast to
> >> java.util.Map\n\tat
> >>
> org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:78)\n\tat
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)\n\tat
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat
> >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat
> >>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat
> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
> >>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
> >>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> >>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> >>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
> >>
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)\n\tat
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat
> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat
> >>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
> >> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat
> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat
> >> org.eclipse.jetty.io
> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> >>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
> >>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
> >>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
> >> java.lang.Thread.run(Thread.java:748)\n",
> >>
> &g

Re: backup strategy

2019-10-04 Thread Mikhail Khludnev
Hello, Koen.
What about switching "query" alias to restored collection, and then nuking
the old one?

On Fri, Oct 4, 2019 at 10:52 PM Koen De Groote 
wrote:

> Greetings.
>
> Solr 7.6, cloud.
>
> From what I've researched, backup and restore is pretty straightforward.
> BACKUP and RESTORE are collection commands and the backup is to be put on a
> shared filesystem.
>
> So far so good.
>
> I'm a bit concerned about the RESTORE action. A RESTORE command will create
> a new collection, meaning either I need to pick a new name or delete the
> old one first.
>
> And if I pick a new name, that would still mean restarting all clients that
> have to connect to it.
>
> The documentation speaks or using ALIAS, but I don't see how that works.
>
> I can only create an alias for an existing collection, so I'd first have to
> restore the backup to a different name, verify it is correct, delete the
> old collection and then give it an alias that is the name of the old
> collection?
>
> Or how is this supposed to work?
>
> Because honestly, deleting the existing collection first is rather scary
> and sounds like downtime for a restore is unavoidable.
>
> So, how to properly restore?
>
> Kind regards,
> Koen De Groote
>


-- 
Sincerely yours
Mikhail Khludnev


Re: json.facet throws ClassCastException

2019-10-04 Thread Mikhail Khludnev
Gosh, obviously. see the clue
https://github.com/apache/lucene-solr/blob/7d3dcd220f92f25a997cf1559a91b6d9e1b57c6d/solr/core/src/java/org/apache/solr/search/facet/FacetModule.java#L78

On Fri, Oct 4, 2019 at 10:47 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> Sometimes it comes back in the reply
> "java.lang.ClassCastException: java.lang.String cannot be cast to
> java.util.Map\n\tat
> org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:78)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
> java.lang.Thread.run(Thread.java:748)\n",
>
> -Original Message-
> From: Mikhail Khludnev 
> Sent: Friday, October 04, 2019 2:28 PM
> To: solr-user 
> Subject: Re: json.facet throws ClassCastException
>
> Hello, Webster.
>
> Have you managed to capture stacktrace?
>
> On Fri, Oct 4, 2019 at 8:24 PM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
>
> > I'm trying to understand what is wrong with my query or collection.
> >
> > I have a functioning solr schema and collection. I'm running Solr 7.2
> >
> > When I run with a facet.field it works, but if I change it to use a
> > json.facet it throws a class cast exception.
> >
> > json.facet=prod:{type:terms,field:product,mincount:1,limit:8}
> >
> > java.lang.String cannot be cast to java.util.Map
> >
> > The product field is defined as
> > 
> >
> > And lowercase is defined as:
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> >   
> > 
> >
> > I don't have enough information to understand what its complaining about.
> >
> > Thanks
> > This message and any attachment are confidential and may be privileged
> > or othe

Re: json.facet throws ClassCastException

2019-10-04 Thread Mikhail Khludnev
Hello, Webster.

Have you managed to capture stacktrace?

On Fri, Oct 4, 2019 at 8:24 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> I'm trying to understand what is wrong with my query or collection.
>
> I have a functioning solr schema and collection. I'm running Solr 7.2
>
> When I run with a facet.field it works, but if I change it to use a
> json.facet it throws a class cast exception.
>
> json.facet=prod:{type:terms,field:product,mincount:1,limit:8}
>
> java.lang.String cannot be cast to java.util.Map
>
> The product field is defined as
> 
>
> And lowercase is defined as:
>  positionIncrementGap="100">
>   
> 
> 
>   
> 
>
> I don't have enough information to understand what its complaining about.
>
> Thanks
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith. Click http://www.merckgroup.com/disclaimer to access the
> German, French, Spanish and Portuguese versions of this disclaimer.
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Read timed out exception during Solr Full-import

2019-10-02 Thread Mikhail Khludnev
Hello,

May it be related to https://issues.apache.org/jira/browse/SOLR-13735 ?


On Wed, Oct 2, 2019 at 11:03 PM amruth  wrote:

> I am running SolrCloud in Production environment. When I trigger
> Full-import,
> it runs for 30 minutes or so and it is stuck. No matter how many times I
> run, I get the same error. When I look at logs I could see,
>
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at
> org.apache.http.impl.io
> .AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io
> .SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io
> .AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
>
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
> at
>
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
> at
> org.apache.http.impl.io
> .AbstractMessageParser.parse(AbstractMessageParser.java:261)
> at
>
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
> at
>
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
> at
>
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
> at
>
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
> at
>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
> at
>
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)
> at
>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
> at
>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
> at
>
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
>
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
> at
>
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
> at
>
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> Please help me with the issue. Please let me know if you need additional
> details.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Question regarding subqueries

2019-10-02 Thread Mikhail Khludnev
Hello, Bram.

Something like that is possible in principle, but it will take enormous
efforts to tackle exact syntax.
Why not something like children.fq=-parent:true ?

On Wed, Oct 2, 2019 at 8:52 PM Bram Biesbrouck <
bram.biesbro...@reinvention.be> wrote:

> Hi all,
>
> I'm struggling with a little period-sign difficulty and instead of pulling
> out my hair, I wonder if any of you could help me out...
>
> Here's the query:
> q=uri:"/en/blah"=id,uri,children:[subquery]={!prefix f=id v=$
> row.id}=*
>
> It just searches for a document with the field "uri" set to "/en/blah".
> For every hit (just one), it tries to manually fetch the subdocuments using
> the id field of the hit since its children have id's like
> ..
> Note that I know this should be done with nested documents and the
> ChildDocTransformer... this is just an exercise to train my brain...
>
> The query above works fine. However, it also returns the parent document,
> because the prefix search includes it as well, of course. However, if I'm
> changing the subquery to something along the lines of this:
>
> {!prefix f=id v=concat($row.id,".")}
> or
> {!prefix f=id v="$row.id\.")}
> or
> {!query defType=lucene v=concat("id:",$row.id,".")}
>
> I get no results back.
>
> I feel like I'm missing only a simple thing here, but can't seem to
> pinpoint it.
>
> Any help?
>
> b.
> <http://www.reinvention.be> *We do video technology*
> Visit our new website! <http://www.reinvention.be> *Bram Biesbrouck*
> bram.biesbro...@reinvention.be
> +32 486 118280 <0032%20486%20118280>
>


-- 
Sincerely yours
Mikhail Khludnev


  1   2   3   4   5   6   7   8   9   10   >