Re: Solr 8.1.1 installation in Azure App service

2020-11-04 Thread Shawn Heisey

On 11/3/2020 11:49 PM, Narayanan, Bhagyasree wrote:

Steps we followed for creating Solr App service:

 1. Created a blank sitecore 9.3 solution from Azure market place and
created a Web app for Solr.
 2. Unzipped the Solr 8.1.1 package and copied all the contents to
wwwroot folder of the Web app created for Solr using WinSCP/FTP.
 3. Created a new Solr core by creating a new folder {index folder} and
copied 'conf' from the "/site/wwwroot/server/solr/configsets/_default".
 4. Created a core.properties file with numShards=2 & name={index folder}


Can you give us the precise locations of all core.properties files that 
you have and ALL of the contents of those files?  There should not be 
any sensitive information in them -- no passwords or anything like that.


It would also be helpful to see the entire solr.log file, taken shortly 
after Solr starts.  The error will have much more detail than you shared 
in your previous message.


This mailing list eats attachments.  So for the logfile, you'll need to 
post the file to a filesharing service and give us a URL.  Dropbox is an 
example of this.  For the core.properties files, which are not very 
long, it will probably be best if you paste the entire contents into 
your email reply.  If you attach files to your email, we won't be able 
to see them.


Thanks,
Shawn


Re: Solr migration related issues.

2020-11-04 Thread Shawn Heisey

On 11/4/2020 9:32 PM, Modassar Ather wrote:

Another thing: how can I control the core naming? I want the core name to
be *mycore* instead of *mycore**_shard1_replica_n1*/*mycore*
*_shard2_replica_n2*.
I tried setting it using property.name=*mycore* but it did not work.
What can I do to achieve this? I am not able to find any config option.


Why would you need to this or even want to?  It sounds to me like an XY 
problem.


http://xyproblem.info/


I understand the core.properties file is required for core discovery but
when this file is present under a subdirectory of SOLR_HOME I see it not
getting loaded and not available in Solr dashboard.


You should not be trying to manipulate core.properties files yourself. 
This is especially discouraged when Solr is running in cloud mode.


When you're in cloud mode, the collection information in zookeeper will 
always be consulted during core discovery.  If the found core is NOT 
described in zookeeper, it will not be loaded.  And in any recent Solr 
version when running in cloud mode, a core that is not referenced in ZK 
will be entirely deleted.


Thanks,
Shawn


Re: Solr migration related issues.

2020-11-04 Thread Modassar Ather
Hi Erick,

I have put solr configs in Zookeeper. I have created a collection using the
following API.
admin/collections?action=CREATE=mycore=2=1=mycore&
property.name=mycore

The collection got created and I can see *mycore**_shard1_replica_n1* and
*mycore**_shard2_replica_n2* under core on the dashboard. The
core.properties got created with the following values.
numShards=2
collection.configName=mycore
name=mycore
replicaType=NRT
shard=shard1
collection=mycore
coreNodeName=core_node3

Another thing: how can I control the core naming? I want the core name to
be *mycore* instead of *mycore**_shard1_replica_n1*/*mycore*
*_shard2_replica_n2*.
I tried setting it using property.name=*mycore* but it did not work.
What can I do to achieve this? I am not able to find any config option.

I understand the core.properties file is required for core discovery but
when this file is present under a subdirectory of SOLR_HOME I see it not
getting loaded and not available in Solr dashboard.
Previous core.properties values :

numShards=2

name=mycore

collection=mycore

configSet=mycore
Can you please help me with this?

Best,
Modassar


On Wed, Nov 4, 2020 at 7:37 PM Erick Erickson 
wrote:

> inline
>
> > On Nov 4, 2020, at 2:17 AM, Modassar Ather 
> wrote:
> >
> > Thanks Erick and Ilan.
> >
> > I am using APIs to create core and collection and have removed all the
> > entries from core.properties. Currently I am facing init failure and
> > debugging it.
> > Will write back if I am facing any issues.
> >
>
> If that means you still _have_ a core.properties file and it’s empty, that
> won’t
> work.
>
> When Solr starts, it goes through “core discovery”. Starting at SOLR_HOME
> it
> recursively descends the directories and whenever it finds a
> “core.properties”
> file says “aha! There’s a replica here. I'll go tell Zookeeper who I am
> and that
> I'm open for business”. It uses the values in core.properties to know what
> collection and shard it belongs to and which replica of that shard it is.
>
> Incidentally, core discovery stops descending and moves to the next sibling
> directory when it hits the first core.properties file so you can’t have a
> replica
> underneath another replica in your directory tree.
>
> You’ll save yourself a lot of grief if you start with an empty SOLR_HOME
> (except
> for solr.xml if you haven’t put it in Zookeeper. BTW, I’d recommend you do
> put
> put solr.xml in Zookeeper!).
>
> Best,
> Erick
>
>
> > Best,
> > Modassar
> >
> > On Wed, Nov 4, 2020 at 3:20 AM Erick Erickson 
> > wrote:
> >
> >> Do note, though, that the default value for legacyCloud changed from
> >> true to false so even though you can get it to work by setting
> >> this cluster prop I wouldn’t…
> >>
> >> The change in the default value is why it’s failing for you.
> >>
> >>
> >>> On Nov 3, 2020, at 11:20 AM, Ilan Ginzburg  wrote:
> >>>
> >>> I second Erick's recommendation, but just for the record legacyCloud
> was
> >>> removed in (upcoming) Solr 9 and is still available in Solr 8.x. Most
> >>> likely this explains Modassar why you found it in the documentation.
> >>>
> >>> Ilan
> >>>
> >>>
> >>> On Tue, Nov 3, 2020 at 5:11 PM Erick Erickson  >
> >>> wrote:
> >>>
>  You absolutely need core.properties files. It’s just that they
>  should be considered an “implementation detail” that you
>  should rarely, if ever need to be aware of.
> 
>  Scripting manual creation of core.properties files in order
>  to define your collections has never been officially supported, it
>  just happened to work.
> 
>  Best,
>  Erick
> 
> > On Nov 3, 2020, at 11:06 AM, Modassar Ather 
>  wrote:
> >
> > Thanks Erick for your response.
> >
> > I will certainly use the APIs and not rely on the core.properties. I
> >> was
> > going through the documentation on core.properties and found it to be
>  still
> > there.
> > I have all the solr install scripts based on older Solr versions and
>  wanted
> > to re-use the same as the core.properties way is still available.
> >
> > So does this mean that we do not need core.properties anymore?
> > How can we ensure that the core name is configurable and not
> >> dynamically
> > set?
> >
> > I will try to use the APIs to create the collection as well as the
> >> cores.
> >
> > Best,
> > Modassar
> >
> > On Tue, Nov 3, 2020 at 5:55 PM Erick Erickson <
> erickerick...@gmail.com
> >>>
> > wrote:
> >
> >> You’re relying on legacyMode, which is no longer supported. In
> >> older versions of Solr, if a core.properties file was found on disk
> >> Solr
> >> attempted to create the replica (and collection) on the fly. This is
> >> no
> >> longer true.
> >>
> >>
> >> Why are you doing it this manually instead of using the collections
> >> API?
> >> You can precisely place each replica with that API in a way that’ll
> >> be continued to be 

Unable to finish sending updates - Solr 8.5.0

2020-11-04 Thread Scott Q.
I am seeing the same error as in this thread:


http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202004.mbox/
[1]


with Solr 8.5.0


2020-11-04 16:58:00.998 WARN  (qtp335107734-3042730) [c:dovecot
s:shard1 r:core_node44 x:dovecot_shard1_replica_n43]
o.a.s.u.SolrCmdDistributor Unable to finish sending updates => java.io
[2].IOException: Task queue processing has stalled for 20138 ms with 0
remaining elements to process.
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.blockUntilFinished(ConcurrentUpdateHttp2SolrClient.java:501
[3])




Does anyone know what's causing this ?
This is a 4 node Cloud setup with 4 zookeeper instances


Links:
--
[1]
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202004.mbox/%3csn6pr02mb42860fedb0f0a5958f35568fcd...@sn6pr02mb4286.namprd02.prod.outlook.com%3E
[2] http://java.io
[3] http://ConcurrentUpdateHttp2SolrClient.java:501


Re: docValues usage

2020-11-04 Thread Wei
And in the case of both stored=true and docValues=true,  Solr 8.x shall be
choosing the optimal approach by itself?

On Wed, Nov 4, 2020 at 9:15 AM Wei  wrote:

> Thanks Erick. As indexed is not necessary,  and docValues is more
> efficient than stored fields for function queries, so  we shall go with the
> following:
>
>   3) indexed=false,  stored=false,  docValues=true.
>
> Is my understanding correct?
>
> Best,
> Wei
>
> On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
> wrote:
>
>> You don’t need to index the field for function queries, see:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>>
>> Function queries, as opposed to sorting, faceting and grouping are
>> evaluated at search time where the
>> search process is already parked on the document anyway, so answering the
>> question “for doc X, what
>> is the value of field Y” to compute the score. DocValues are still more
>> efficient I think, although I
>> haven’t measured explicitly...
>>
>> For sorting, faceting and grouping, it’s a much different story. Take
>> sorting. You have to ask
>> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
>> docX. Doc Z is long gone
>> and getting the value for field Y much more expensive.
>>
>> Also, docValues will not increase memory requirements _unless used_.
>> Otherwise they’ll
>> just sit there on disk. They will certainly increase disk space whether
>> used or not.
>>
>> And _not_ using docValues when you facet, group or sort will also
>> _certainly_ increase
>> your heap requirements since the docValues structure must be built on the
>> heap rather
>> than be in MMapDirectory space.
>>
>> Best,
>> Erick
>>
>>
>> > On Nov 4, 2020, at 5:32 AM, uyilmaz 
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm by no means expert on this so if anyone sees a mistake please
>> correct me.
>> >
>> > I think you need to index this field, since boost functions are added
>> to the query as optional clauses (
>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>> It's like boosting a regular field by putting ^2 next to it in a query.
>> Storing or enabling docValues will unnecesarily consume space/memory.
>> >
>> > On Tue, 3 Nov 2020 16:10:50 -0800
>> > Wei  wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a couple of primitive single value numeric type fields,  their
>> >> values are used in boosting functions, but not used in sort/facet. or
>> in
>> >> returned response.   Should I use docValues for them in the schema?  I
>> can
>> >> think of the following options:
>> >>
>> >> 1)   indexed=true,  stored=true, docValues=false
>> >> 2)   indexed=true, stored=false, docValues=true
>> >> 3)   indexed=false,  stored=false,  docValues=true
>> >>
>> >> What would be the performance implications for these options?
>> >>
>> >> Best,
>> >> Wei
>> >
>> >
>> > --
>> > uyilmaz 
>>
>>


Re: docValues usage

2020-11-04 Thread Wei
Thanks Erick. As indexed is not necessary,  and docValues is more efficient
than stored fields for function queries, so  we shall go with the
following:

  3) indexed=false,  stored=false,  docValues=true.

Is my understanding correct?

Best,
Wei

On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson 
wrote:

> You don’t need to index the field for function queries, see:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>
> Function queries, as opposed to sorting, faceting and grouping are
> evaluated at search time where the
> search process is already parked on the document anyway, so answering the
> question “for doc X, what
> is the value of field Y” to compute the score. DocValues are still more
> efficient I think, although I
> haven’t measured explicitly...
>
> For sorting, faceting and grouping, it’s a much different story. Take
> sorting. You have to ask
> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
> docX. Doc Z is long gone
> and getting the value for field Y much more expensive.
>
> Also, docValues will not increase memory requirements _unless used_.
> Otherwise they’ll
> just sit there on disk. They will certainly increase disk space whether
> used or not.
>
> And _not_ using docValues when you facet, group or sort will also
> _certainly_ increase
> your heap requirements since the docValues structure must be built on the
> heap rather
> than be in MMapDirectory space.
>
> Best,
> Erick
>
>
> > On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> >
> > Hi,
> >
> > I'm by no means expert on this so if anyone sees a mistake please
> correct me.
> >
> > I think you need to index this field, since boost functions are added to
> the query as optional clauses (
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
> It's like boosting a regular field by putting ^2 next to it in a query.
> Storing or enabling docValues will unnecesarily consume space/memory.
> >
> > On Tue, 3 Nov 2020 16:10:50 -0800
> > Wei  wrote:
> >
> >> Hi,
> >>
> >> I have a couple of primitive single value numeric type fields,  their
> >> values are used in boosting functions, but not used in sort/facet. or in
> >> returned response.   Should I use docValues for them in the schema?  I
> can
> >> think of the following options:
> >>
> >> 1)   indexed=true,  stored=true, docValues=false
> >> 2)   indexed=true, stored=false, docValues=true
> >> 3)   indexed=false,  stored=false,  docValues=true
> >>
> >> What would be the performance implications for these options?
> >>
> >> Best,
> >> Wei
> >
> >
> > --
> > uyilmaz 
>
>


RE: How to raise open file limits

2020-11-04 Thread DAVID MARTIN NIETO
And too this:

open files  (-n) 1024

At least.

David Martín Nieto
Analista Funcional
Calle Cabeza Mesada 5
28031, Madrid
T: +34 667 414 432
T: +34 91 779 56 98| Ext. 3198
E-mail: dmart...@viewnext.com | Web: www.viewnext.com

[https://mail.google.com/mail/u/0?ui=2=72317294cd=0.0.2=msg-f:1662155651369049897=171129c229429f29=fimg=s0-l75-ft=ANGjdJ_o0Ds8_P8d7W-csq2mmc6mBGQy9hSjXsGEv15RXUutalCYzg3HQB3CByE2swcJkH3yRaLwrXkr1G81F9FpfqcPlbpRoZcainmsJjviLoypusuKOxCnOw97zuo=emb]




De: DAVID MARTIN NIETO 
Enviado: miércoles, 4 de noviembre de 2020 16:20
Para: solr-user@lucene.apache.org 
Asunto: RE: How to raise open file limits

Hi,

You must have to change the ulimit -a parameters on your SO config.
I believe the problem that you have is in:
max user processes  (-u) 4096

Kind regards.

David Martín Nieto
Analista Funcional
Calle Cabeza Mesada 5
28031, Madrid
T: +34 667 414 432
T: +34 91 779 56 98| Ext. 3198
E-mail: dmart...@viewnext.com | Web: www.viewnext.com

[https://mail.google.com/mail/u/0?ui=2=72317294cd=0.0.2=msg-f:1662155651369049897=171129c229429f29=fimg=s0-l75-ft=ANGjdJ_o0Ds8_P8d7W-csq2mmc6mBGQy9hSjXsGEv15RXUutalCYzg3HQB3CByE2swcJkH3yRaLwrXkr1G81F9FpfqcPlbpRoZcainmsJjviLoypusuKOxCnOw97zuo=emb]




De: James Rome 
Enviado: miércoles, 4 de noviembre de 2020 16:03
Para: solr-user@lucene.apache.org 
Asunto: How to raise open file limits

I am new to solr. I have solr installed in my home directory
(/home/jar/solr).

But when I start the tutorial, I get an open files limit error.

  $ ./bin/solr start -e cloud
*** [WARN] *** Your open file limit is currently 1024.
  It should be set to 65000 to avoid operational disruption.
  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
false in your profile or solr.in.sh

I made a limits file as follows:

/etc/security/limits.d # cat solr.conf
solr softnofile  65000
solr hardnofile  65000
solr softnproc   65000
solr hardnproc   65000
jar softnofile  65000
jar hardnofile  65000
jar softnproc   65000
jar hardnproc   65000

But this does not seem to solve the issue.

Also, my ultimate goal is to only index one directory and to serve it to
my Drupal site. Is there a way to run solr as a service so that it
restarts on boot? Can you please point me to how to do this?

--
James A. Rome
https://jamesrome.net



RE: How to raise open file limits

2020-11-04 Thread DAVID MARTIN NIETO
Hi,

You must have to change the ulimit -a parameters on your SO config.
I believe the problem that you have is in:
max user processes  (-u) 4096

Kind regards.

David Martín Nieto
Analista Funcional
Calle Cabeza Mesada 5
28031, Madrid
T: +34 667 414 432
T: +34 91 779 56 98| Ext. 3198
E-mail: dmart...@viewnext.com | Web: www.viewnext.com

[https://mail.google.com/mail/u/0?ui=2=72317294cd=0.0.2=msg-f:1662155651369049897=171129c229429f29=fimg=s0-l75-ft=ANGjdJ_o0Ds8_P8d7W-csq2mmc6mBGQy9hSjXsGEv15RXUutalCYzg3HQB3CByE2swcJkH3yRaLwrXkr1G81F9FpfqcPlbpRoZcainmsJjviLoypusuKOxCnOw97zuo=emb]




De: James Rome 
Enviado: miércoles, 4 de noviembre de 2020 16:03
Para: solr-user@lucene.apache.org 
Asunto: How to raise open file limits

I am new to solr. I have solr installed in my home directory
(/home/jar/solr).

But when I start the tutorial, I get an open files limit error.

  $ ./bin/solr start -e cloud
*** [WARN] *** Your open file limit is currently 1024.
  It should be set to 65000 to avoid operational disruption.
  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
false in your profile or solr.in.sh

I made a limits file as follows:

/etc/security/limits.d # cat solr.conf
solr softnofile  65000
solr hardnofile  65000
solr softnproc   65000
solr hardnproc   65000
jar softnofile  65000
jar hardnofile  65000
jar softnproc   65000
jar hardnproc   65000

But this does not seem to solve the issue.

Also, my ultimate goal is to only index one directory and to serve it to
my Drupal site. Is there a way to run solr as a service so that it
restarts on boot? Can you please point me to how to do this?

--
James A. Rome
https://jamesrome.net



How to raise open file limits

2020-11-04 Thread James Rome
I am new to solr. I have solr installed in my home directory 
(/home/jar/solr).


But when I start the tutorial, I get an open files limit error.

 $ ./bin/solr start -e cloud
*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to 
false in your profile or solr.in.sh


I made a limits file as follows:

/etc/security/limits.d # cat solr.conf
solr soft    nofile  65000
solr hard    nofile  65000
solr soft    nproc   65000
solr hard    nproc   65000
jar soft    nofile  65000
jar hard    nofile  65000
jar soft    nproc   65000
jar hard    nproc   65000

But this does not seem to solve the issue.

Also, my ultimate goal is to only index one directory and to serve it to 
my Drupal site. Is there a way to run solr as a service so that it 
restarts on boot? Can you please point me to how to do this?


--
James A. Rome
https://jamesrome.net



[ANNOUNCE] Apache Solr 8.7.0 released

2020-11-04 Thread Atri Sharma
3/11/2020, Apache Solr™ 8.7 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.7

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search and analytics, rich
document parsing, geospatial search, extensive REST APIs as well as
parallel SQL. Solr is enterprise grade, secure and highly scalable,
providing fault tolerant distributed search and indexing, and powers
the search and navigation features of many of the world's largest
internet sites.


The release is available for immediate download at:


https://lucene.apache.org/solr/downloads.html


Please read CHANGES.txt for a detailed list of changes:


https://lucene.apache.org/solr/8_7_0/changes/Changes.html


Solr 8.7.0 Release Highlights


SOLR-14588 -- Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker


SOLR-14615 –- CPU Based Circuit Breaker


SOLR-14537 -- Improve performance of ExportWriter


SOLR-14651 -- The MetricsHistoryHandler Can Be Disabled


A summary of important changes is published in the Solr Reference
Guide at https://lucene.apache.org/solr/guide/8_7/solr-upgrade-notes.html.
For the most exhaustive list, see the full release notes at
https://lucene.apache.org/solr/8_7_0/changes/Changes.html or by
viewing the CHANGES.txt file accompanying the distribution.  Solr's
release notes usually don't include Lucene layer changes.  Lucene's
release notes are at
https://lucene.apache.org/core/8_7_0/changes/Changes.html


Note: The Apache Software Foundation uses an extensive mirroring network for

distributing releases. It is possible that the mirror you are using may not have

replicated the release yet. If that is the case, please try another mirror.

This also applies to Maven access.



-- 
Regards,

Atri
Apache Concerted


Re: Solr migration related issues.

2020-11-04 Thread Erick Erickson
inline

> On Nov 4, 2020, at 2:17 AM, Modassar Ather  wrote:
> 
> Thanks Erick and Ilan.
> 
> I am using APIs to create core and collection and have removed all the
> entries from core.properties. Currently I am facing init failure and
> debugging it.
> Will write back if I am facing any issues.
> 

If that means you still _have_ a core.properties file and it’s empty, that won’t
work.

When Solr starts, it goes through “core discovery”. Starting at SOLR_HOME it
recursively descends the directories and whenever it finds a “core.properties”
file says “aha! There’s a replica here. I'll go tell Zookeeper who I am and 
that 
I'm open for business”. It uses the values in core.properties to know what 
collection and shard it belongs to and which replica of that shard it is.

Incidentally, core discovery stops descending and moves to the next sibling
directory when it hits the first core.properties file so you can’t have a 
replica
underneath another replica in your directory tree.

You’ll save yourself a lot of grief if you start with an empty SOLR_HOME (except
for solr.xml if you haven’t put it in Zookeeper. BTW, I’d recommend you do put
put solr.xml in Zookeeper!).

Best,
Erick


> Best,
> Modassar
> 
> On Wed, Nov 4, 2020 at 3:20 AM Erick Erickson 
> wrote:
> 
>> Do note, though, that the default value for legacyCloud changed from
>> true to false so even though you can get it to work by setting
>> this cluster prop I wouldn’t…
>> 
>> The change in the default value is why it’s failing for you.
>> 
>> 
>>> On Nov 3, 2020, at 11:20 AM, Ilan Ginzburg  wrote:
>>> 
>>> I second Erick's recommendation, but just for the record legacyCloud was
>>> removed in (upcoming) Solr 9 and is still available in Solr 8.x. Most
>>> likely this explains Modassar why you found it in the documentation.
>>> 
>>> Ilan
>>> 
>>> 
>>> On Tue, Nov 3, 2020 at 5:11 PM Erick Erickson 
>>> wrote:
>>> 
 You absolutely need core.properties files. It’s just that they
 should be considered an “implementation detail” that you
 should rarely, if ever need to be aware of.
 
 Scripting manual creation of core.properties files in order
 to define your collections has never been officially supported, it
 just happened to work.
 
 Best,
 Erick
 
> On Nov 3, 2020, at 11:06 AM, Modassar Ather 
 wrote:
> 
> Thanks Erick for your response.
> 
> I will certainly use the APIs and not rely on the core.properties. I
>> was
> going through the documentation on core.properties and found it to be
 still
> there.
> I have all the solr install scripts based on older Solr versions and
 wanted
> to re-use the same as the core.properties way is still available.
> 
> So does this mean that we do not need core.properties anymore?
> How can we ensure that the core name is configurable and not
>> dynamically
> set?
> 
> I will try to use the APIs to create the collection as well as the
>> cores.
> 
> Best,
> Modassar
> 
> On Tue, Nov 3, 2020 at 5:55 PM Erick Erickson >> 
> wrote:
> 
>> You’re relying on legacyMode, which is no longer supported. In
>> older versions of Solr, if a core.properties file was found on disk
>> Solr
>> attempted to create the replica (and collection) on the fly. This is
>> no
>> longer true.
>> 
>> 
>> Why are you doing it this manually instead of using the collections
>> API?
>> You can precisely place each replica with that API in a way that’ll
>> be continued to be supported going forward.
>> 
>> This really sounds like an XY problem, what is the use-case you’re
>> trying to solve?
>> 
>> Best,
>> Erick
>> 
>>> On Nov 3, 2020, at 6:39 AM, Modassar Ather 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am migrating from Solr 6.5.1 to Solr 8.6.3. As a part of the entire
>>> upgrade I have the first task to install and configure the solr with
 the
>>> core and collection. The solr is installed in SolrCloud mode.
>>> 
>>> In Solr 6.5.1 I was using the following key values in core.properties
>> file.
>>> The configuration files were uploaded to zookeeper using the upconfig
>>> command.
>>> The core and collection was automatically created with the setting in
>>> core.properties files and the configSet uploaded in zookeeper and it
 used
>>> to display on the Solr 6.5.1 dashboard.
>>> 
>>> numShards=12
>>> 
>>> name=mycore
>>> 
>>> collection=mycore
>>> 
>>> configSet=mycore
>>> 
>>> 
>>> With the latest Solr 8.6.3 the same approach is not working. As per
>> my
>>> understanding the core is identified using the location of
>> core.properties
>>> which is under */mycore/core.properties.*
>>> 
>>> Can you please help me with the following?
>>> 
>>> 
>>> - Is there any property I am missing to load the core and 

Re: Commits (with openSearcher = true) are too slow in solr 8

2020-11-04 Thread Erick Erickson
I completely agree with Shawn. I’d emphasize that your heap is that large
probably to accommodate badly mis-configured caches.

Why it’s different in 5.4 I don’t quite know, but 10-12
minutes is unacceptable anyway.

My guess is that you made your heaps that large as a consequence of
having low hit rates. If you were using bare NOW in fq clauses,
perhaps you were getting very low hit rates as a result and expanded
the cache size, see:

https://dzone.com/articles/solr-date-math-now-and-filter

At any rate, I _strongly_ recommend that you drop your filterCache
to the default size of 512, and drop your autowarmCount to something
very small, say 16. Ditto for queryResultCache. The documentCache
to maybe 10,000 (autowarm is a no-op for documentCache). Then
drop your heap to something closer to 16G. Then test, tune, test. Do
NOT assume bigger caches are the answer until you have evidence.
Keep reducing your heap size until you start to see GC problems (on 
a test system obviously) to get your lower limit. Then add some
back for your production to give you some breathing room.

Finally, see Uwe’s blog:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

to get a sense of why the size on disk is not necessarily a good
indicator of the heap requirements.

Best,
Erick

> On Nov 4, 2020, at 2:40 AM, Shawn Heisey  wrote:
> 
> On 11/3/2020 11:46 PM, raj.yadav wrote:
>> We have two parallel system one is  solr 8.5.2 and other one is solr 5.4
>> In solr_5.4 commit time with opensearcher true is 10 to 12 minutes while in
>> solr_8 it's around 25 minutes.
> 
> Commits on a properly configured and sized system should take a few seconds, 
> not minutes.  10 to 12 minutes for a commit is an enormous red flag.
> 
>> This is our current caching policy of solr_8
>> >  size="32768"
>>  initialSize="6000"
>>  autowarmCount="6000"/>
> 
> This is probably the culprit.  Do you know how many entries the filterCache 
> actually ends up with?  What you've said with this config is "every time I 
> open a new searcher, I'm going to execute up to 6000 queries against the new 
> index."  If each query takes one second, running 6000 of them is going to 
> take 100 minutes.  I have seen these queries take a lot longer than one 
> second.
> 
> Also, each entry in the filterCache can be enormous, depending on the number 
> of docs in the index.  Let's say that you have five million documents in your 
> core.  With five million documents, each entry in the filterCache is going to 
> be 625000 bytes.  That means you need 20GB of heap memory for a full 
> filterCache of 32768 entries -- 20GB of memory above and beyond everything 
> else that Solr requires.  Your message doesn't say how many documents you 
> have, it only says the index is 11GB.  From that, it is not possible for me 
> to figure out how many documents you have.
> 
>> While debugging this we came across this page.
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Slowcommits
> 
> I wrote that wiki page.
> 
>> Here one of the reasons for slow commit is mentioned as:
>> */`Heap size issues. Problems from the heap being too big will tend to be
>> infrequent, while problems from the heap being too small will tend to happen
>> consistently.`/*
>> Can anyone please help me understand the above point?
> 
> If your heap is a lot bigger than it needs to be, then what you'll see is 
> slow garbage collections, but it won't happen very often.  If the heap is too 
> small, then there will be garbage collections that happen REALLY often, 
> leaving few system resources for actually running the program.  This applies 
> to ANY Java program, not just Solr.
> 
>> System config:
>> disk size: 250 GB
>> cpu: (8 vcpus, 64 GiB memory)
>> Index size: 11 GB
>> JVM heap size: 30 GB
> 
> That heap seems to be a lot larger than it needs to be.  I have run systems 
> with over 100GB of index, with tens of millions of documents, on an 8GB heap. 
>  My filterCache on each core had a max size of 64, with an autowarmCount of 
> four ... and commits STILL would take 10 to 15 seconds, which I consider to 
> be very slow.  Most of that time was spent executing those four queries in 
> order to autowarm the filterCache.
> 
> What I would recommend you start with is reducing the size of the 
> filterCache.  Try a size of 128 and an autowarmCount of 8, see what you get 
> for a hit rate on the cache.  Adjust from there as necessary.  And I would 
> reduce the heap size for Solr as well -- your heap requirements should drop 
> dramatically with a reduced filterCache.
> 
> Thanks,
> Shawn



Re: docValues usage

2020-11-04 Thread Erick Erickson
You don’t need to index the field for function queries, see: 
https://lucene.apache.org/solr/guide/8_6/docvalues.html.

Function queries, as opposed to sorting, faceting and grouping are evaluated at 
search time where the  
search process is already parked on the document anyway, so answering the 
question “for doc X, what
is the value of field Y” to compute the score. DocValues are still more 
efficient I think, although I
haven’t measured explicitly...

For sorting, faceting and grouping, it’s a much different story. Take sorting. 
You have to ask
“for field Y, what’s the value in docX and docZ?”. Say you’re parked on docX. 
Doc Z is long gone 
and getting the value for field Y much more expensive.

Also, docValues will not increase memory requirements _unless used_. Otherwise 
they’ll
just sit there on disk. They will certainly increase disk space whether used or 
not.

And _not_ using docValues when you facet, group or sort will also _certainly_ 
increase
your heap requirements since the docValues structure must be built on the heap 
rather
than be in MMapDirectory space.

Best,
Erick


> On Nov 4, 2020, at 5:32 AM, uyilmaz  wrote:
> 
> Hi,
> 
> I'm by no means expert on this so if anyone sees a mistake please correct me.
> 
> I think you need to index this field, since boost functions are added to the 
> query as optional clauses 
> (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>  It's like boosting a regular field by putting ^2 next to it in a query. 
> Storing or enabling docValues will unnecesarily consume space/memory.
> 
> On Tue, 3 Nov 2020 16:10:50 -0800
> Wei  wrote:
> 
>> Hi,
>> 
>> I have a couple of primitive single value numeric type fields,  their
>> values are used in boosting functions, but not used in sort/facet. or in
>> returned response.   Should I use docValues for them in the schema?  I can
>> think of the following options:
>> 
>> 1)   indexed=true,  stored=true, docValues=false
>> 2)   indexed=true, stored=false, docValues=true
>> 3)   indexed=false,  stored=false,  docValues=true
>> 
>> What would be the performance implications for these options?
>> 
>> Best,
>> Wei
> 
> 
> -- 
> uyilmaz 



Re: when to use stored over docValues and useDocValuesAsStored

2020-11-04 Thread Erick Erickson


> On Nov 4, 2020, at 6:43 AM, uyilmaz  wrote:
> 
> Hi,
> 
> I heavily use streaming expressions and facets, or export large amounts of 
> data from Solr to Spark to make analyses.
> 
> Please correct me if I know wrong:
> 
> + requesting a non-docValues field in a response causes whole document to be 
> decompressed and read from disk

non-docValues fields don’t work at all for many stream spources, IIRC only the 
Topic Stream will work with stored values. The read/decompress/extract cycle 
would be unacceptable performance-wise for large data sets otherwise.

> + streaming expressions and export handler requires every field read to have 
> docValues

Pretty muche.

> 
> - docValues increases index size, therefore memory requirement, stored only 
> uses disk space

Yes. 

> - stored preserves order of multivalued fields

Yes.

> 
> It seems stored is only useful when I have a multivalued field that I care 
> about the index-time order of things, and since I will be using the export 
> handler, it will use docValues anyways and lose the order.

Yes.

> 
> So is there any case that I need stored=true?

Not for export outside of the Topic Stream as above. stored=true is there for 
things like showing the user the original input and highlighting.

> 
> Best,
> ufuk
> 
> -- 
> uyilmaz 



when to use stored over docValues and useDocValuesAsStored

2020-11-04 Thread uyilmaz
Hi,

I heavily use streaming expressions and facets, or export large amounts of data 
from Solr to Spark to make analyses.

Please correct me if I know wrong:

+ requesting a non-docValues field in a response causes whole document to be 
decompressed and read from disk
+ streaming expressions and export handler requires every field read to have 
docValues

- docValues increases index size, therefore memory requirement, stored only 
uses disk space
- stored preserves order of multivalued fields

It seems stored is only useful when I have a multivalued field that I care 
about the index-time order of things, and since I will be using the export 
handler, it will use docValues anyways and lose the order.

So is there any case that I need stored=true?

Best,
ufuk

-- 
uyilmaz 


Re: docValues usage

2020-11-04 Thread uyilmaz
Hi,

I'm by no means expert on this so if anyone sees a mistake please correct me.

I think you need to index this field, since boost functions are added to the 
query as optional clauses 
(https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
 It's like boosting a regular field by putting ^2 next to it in a query. 
Storing or enabling docValues will unnecesarily consume space/memory.

On Tue, 3 Nov 2020 16:10:50 -0800
Wei  wrote:

> Hi,
> 
> I have a couple of primitive single value numeric type fields,  their
> values are used in boosting functions, but not used in sort/facet. or in
> returned response.   Should I use docValues for them in the schema?  I can
> think of the following options:
> 
>  1)   indexed=true,  stored=true, docValues=false
>  2)   indexed=true, stored=false, docValues=true
>  3)   indexed=false,  stored=false,  docValues=true
> 
> What would be the performance implications for these options?
> 
> Best,
> Wei


-- 
uyilmaz 


Solr 8.1.1 installation in Azure App service

2020-11-04 Thread Narayanan, Bhagyasree
Hi Team
  We have created a new sitecore environment with the Azure market 
place solution "Azure Experience Cloud" (PaaS). Sitecore version 9.3 XM scaled 
topology with SOLR search. Since Solr App doesn't come by default with the 
market place solution we created a new Web App for Solr.

Steps we followed for creating Solr App service:

  1.  Created a blank sitecore 9.3 solution from Azure market place and created 
a Web app for Solr.
  2.  Unzipped the Solr 8.1.1 package and copied all the contents to wwwroot 
folder of the Web app created for Solr using WinSCP/FTP.
  3.  Created a new Solr core by creating a new folder {index folder} and 
copied 'conf' from the "/site/wwwroot/server/solr/configsets/_default".
  4.  Created a core.properties file with numShards=2 & name={index folder}

We get the below error:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Index dir 'C:\home\site\wwwroot\server\solr{index folder}\data\index/' of core 
'{index folder}' is already locked. The most likely cause is another Solr 
server (or another solr core in this server) also configured to use this 
directory; other possible causes may be specific to lockType: native

This is a pre-requisite for our Sitecore migration and we are not able to 
proceed further. Requesting you to kindly let us know whether the steps 
followed by us are correct.

Awaiting your earliest response.

Thanks
Bhagyasree
[cid:image001.jpg@01D6B2A4.B1B26ED0]
+91 9941709696

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.


multi-model deployment

2020-11-04 Thread Azad S
Dear solr users

I am new to solr. I am using Gitlab CI to deploy a decision tree model to
solr. However, I have been provided some other models to be deployed to
solr.

I am not sure how to deploy multiple models to solr. Is there a way? Is
there any architecture for multi-module deployment pipeline?

Any help would be appreciated!

Thank you

Azad