R: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Danilo Tomasoni
Thank you all for the suggestions,
The OS is not windows, it's centos, a colleague thinks that even on linux 
defragmenting can improve performance about 2X because it keeps the data 
contiguous on disk.

We cannot use flashcache because we run solr on virtual machines.
We will investigate better on the memory suggestion by Shawn..
thank you very much.

Danilo

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to

Da: Walter Underwood 
Inviato: lunedì 22 febbraio 2021 18:25
A: solr-user@lucene.apache.org 
Oggetto: Re: defragmentation can improve performance on SATA class 10 disk 
~1 rpm ?

[CAUTION: EXTERNAL SENDER]
[Please check correspondence between Sender Display Name and Sender Email 
Address before clicking on any link or opening attachments]


True, but Windows does cache files. It has been a couple of decades since I ran 
search on Windows, but Ultraseek got large gains from setting some sort of 
system property to make it act like a file server and give file caching equal 
priority with program caching.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 22, 2021, at 9:22 AM, dmitri maziuk  wrote:
>
> On 2021-02-22 11:18 AM, Shawn Heisey wrote:
>
>> The OS automatically uses unallocated memory to cache data on the disk.   
>> Because memory is far faster than any disk, even SSD, it performs better.
>
> Depends on the os, from "defragmenting solrdata folder" I suspect the OP is 
> on windows whose filesystems and memory management does not always work the 
> way the Unix textbook says.
>
> Dima



Re: Is 8.8.x going be stabilized and finalized?

2021-02-22 Thread S G
Hey Subhajit,

Can you share briefly what issues are being seen with 8.7+ versions?
We are planning to move a big workload from 7.6 to 8.7 version.

We created a small load-testing tool for sanitizing new Solr versions and
that showed throughput of traffic decreasing much more than Solr 7.6 as we
loaded more and more data in both the versions.
So we are a bit concerned if we should make this move or not.
If 8.7 has some grave blockers (fetaures or performance) known already,
then we will probably hold off on making the move.

Regards
SG

On Wed, Feb 17, 2021 at 11:58 AM Subhajit Das 
wrote:

> Hi Shawn,
>
> Nice to know that Solr will be considered top level project of Apache.
>
> I asked based on earlier 3 version patterns. Just hoping that 8.8 would be
> long term stable, kind of like 7.7.x line-up.
>
> Thanks for the clarification.
>
> Regards,
> Subhajit
>
> From: Shawn Heisey
> Sent: 17 February 2021 09:33 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is 8.8.x going be stabilized and finalized?
>
> On 2/16/2021 7:57 PM, Subhajit Das wrote:
> > I am planning to use 8.8 line-up for production use.
> >
> > But recently, a lot of people are complaining on 8.7 and 8.8. Also,
> there is a clearly known issue on 8.8 as well.
> >
> > Following trends of earlier versions (5.x, 6.x and 7.x), will 8.8 will
> also be finalized?
> > For 5.x, 5.5.x was last version. For 6.x, 6.6.x was last version. For
> 7.x, 7.7.x was last version. It would match the pattern, it seems.
> > And 9.x is already planned and under development.
> > And it seems, we require some stability.
>
> All released versions are considered stable.  Sometimes problems are
> uncovered after release.  Sometimes BIG problems.  We try our very best
> to avoid bugs, but achieving that kind of perfection is nearly
> impossible for any software project.
>
> 8.8.0 is the most current release.  The 8.8.1 release is underway, but
> there's no way I can give you a concrete date.  The announcement MIGHT
> come in the next few days, but it's always possible it could get pushed
> back.  At this time, the changelog for 8.8.1 has five bugfixes
> mentioned.  It should be more stable than 8.8.0, but it's impossible for
> me to tell you whether you will have any problems with it.
>
> On the dev list, the project is discussing the start of work on the 9.0
> release, but that work has not yet begun.  Even if it started tomorrow,
> it would be several weeks, maybe even a few months, before 9.0 is
> actually released.  On top of the "normal" headaches involved in any new
> major version release, there are some other things going on that might
> further delay 9.0 and future 8.x versions:
>
> * Solr is being promoted from a subproject of Lucene to it's own
> top-level project at Apache.  This involves a LOT of work.  Much of that
> work is administrative in nature, which is going to occupy us and take
> away from time that we might spend working on the code and new releases.
> * The build system for the master branch, which is currently versioned
> as 9.0.0-SNAPSHOT, was recently switched from Ant+Ivy to Gradle.  It's
> going to take some time to figure out all the fallout from that migration.
> * Some of the devs have been involved in an effort to greatly simplify
> and rewrite how SolrCloud does internal management of a cluster.  The
> intent is much better stability and better performance.  You might have
> seen public messages referring to a "reference implementation."  At this
> time, it is unclear how much of that work will make it into 9.0 and how
> much will be revealed in later releases.  We would like very much to
> include at least the first phase in 9.0 if we can.
>
>  From what I have seen over the last several years as one of the
> developers on this project, it is likely that 8.9 and possibly even 8.10
> and 8.11 will be released before we see 9.0.  Releases are NOT made on a
> specific schedule, so I cannot tell you which versions you will see or
> when they might happen.
>
> I am fully aware that despite typing quite a lot of text here, that I
> provided almost nothing in the way of concrete information that you can
> use.  Sorry about that.
>
> Thanks,
> Shawn
>
>


Re: Caffeine Cache and Filter Cache in 8.3

2021-02-22 Thread Shawn Heisey

On 2/22/2021 1:50 PM, Stephen Lewis Bianamara wrote:




(a) At what version did the caffeine cache reach production stability?
(b) Is the caffeine cache, and really all implementations, able to be used
on any cache, or are the restrictions about which cache implementations may
be used for which cache? If the latter, can you provide some guidance?


The caffiene-based cache was introduced in Solr 8.3.  It was considered 
viable for production from the time it was introduced.


https://issues.apache.org/jira/browse/SOLR-8241

Something was found and fixed in 8.5.  I do not know what the impact of 
that issue was:


https://issues.apache.org/jira/browse/SOLR-14239

The other cache implementations were deprecated at some point.  Those 
implementations have been removed from the master branch, but still 
exist in the code for 8.x versions.


If you want to use one of the older implementations like FastLRUCache, 
you still can, and will be able to for all future 8.x versions.  When 
9.0 is released at some future date, that will no longer be possible.


The Caffeine-based implementation is probably the best option, but I do 
not have any concrete data to give you.


Thanks,
Shawn


Caffeine Cache and Filter Cache in 8.3

2021-02-22 Thread Stephen Lewis Bianamara
Hi SOLR Community,

I have a question about cache implementations based on some seemingly
inconsistent documentation I'm looking at. I'm currently inquiring about
8.3, but more generally about solr version 8 too for upgrade planning.

In the description in the docs for cache implementations says


[The caffeine] cache implementation is recommended over other legacy caches
as it usually offers [good stuff].

On the other hand, down in the actual filter cache section says

The filter cache uses a specialized cache named as FastLRUCache which is
optimized for fast concurrent access with the trade-off that writes and
evictions are costlier than the LRUCache used for query result cache and
document cache.

This implies the FastLRU cache is the only option as the documentation
doesn't say "defaults to a specialized cache" but rather states it in a way
to imply necessity. Even if we assume that was not intended, it does imply
that the FastLRU cache is the only one suited to the Filter Cache. Further,
the documentation in the SOLR config says yet another thing, that you can
use FastLRU or LRU only


class - the SolrCache implementation LRUCache or
   (LRUCache or FastLRUCache)


Can you help me untangle this to understand the following:

(a) At what version did the caffeine cache reach production stability?
(b) Is the caffeine cache, and really all implementations, able to be used
on any cache, or are the restrictions about which cache implementations may
be used for which cache? If the latter, can you provide some guidance?

Disclaimer: I'm not asking which caches will be fastest for my
applications. I know that you can't know that ;) Rather, I want to be sure
which production version reached production stability (esp. for the filter
cache).

Thanks!
Stephen


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
True, but Windows does cache files. It has been a couple of decades since I ran 
search on Windows, but Ultraseek got large gains from setting some sort of 
system property to make it act like a file server and give file caching equal 
priority with program caching.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 22, 2021, at 9:22 AM, dmitri maziuk  wrote:
> 
> On 2021-02-22 11:18 AM, Shawn Heisey wrote:
> 
>> The OS automatically uses unallocated memory to cache data on the disk.   
>> Because memory is far faster than any disk, even SSD, it performs better.
> 
> Depends on the os, from "defragmenting solrdata folder" I suspect the OP is 
> on windows whose filesystems and memory management does not always work the 
> way the Unix textbook says.
> 
> Dima



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 11:18 AM, Shawn Heisey wrote:

The OS automatically uses unallocated memory to cache data on the disk. 
  Because memory is far faster than any disk, even SSD, it performs better.


Depends on the os, from "defragmenting solrdata folder" I suspect the OP 
is on windows whose filesystems and memory management does not always 
work the way the Unix textbook says.


Dima


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Shawn Heisey

On 2/22/2021 12:52 AM, Danilo Tomasoni wrote:

we are running a solr instance with around 41 MLN documents on a SATA class 10 
disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an 
average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr 
instance with the same data and the same number of segments was around 45 times 
faster.
Force optimize was also tried to improve the performances, but it was very 
slow, so we abandoned it.

Since we still don't have enterprise server ssd disks, we are now wondering if 
in the meanwhile defragmenting the solrdata folder can help.
The idea is that due to many updates, each segment file is fragmented across 
different phisical blocks.
Put in another way, each segment file is non-contiguous on disk, and this can 
slow-down the solr response.


The absolute best thing you can do to improve Solr performance is add 
memory.


The OS automatically uses unallocated memory to cache data on the disk. 
 Because memory is far faster than any disk, even SSD, it performs better.


I wrote a wiki page about it:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

If you have sufficient memory, the speed of your disks will have little 
effect on performance.  It's only in cases where there is not enough 
memory that disk performance will matter.


Thanks,
Shawn



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
A forced merge might improve speed 20%. Going from spinning disk to SSD
will improve speed 20X or more. Don’t waste your time even thinking about
forced merges.

You need to get SSDs.

The even bigger speedup is to get enough RAM that the OS can keep the 
Solr index files in file system buffers. Check how much space is used by
your indexes, then make sure that there is that much available RAM that
is not used by the OS or Solr JVM.

Some people make the mistake of giving a huge heap to the JVM, thinking
this will improve caching. This almost always makes things worse, by 
using RAM that could be use for caching files. 8GB of heap is usually enough.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 21, 2021, at 11:52 PM, Danilo Tomasoni  wrote:
> 
> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA class 
> 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..) with 
> an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr 
> instance with the same data and the same number of segments was around 45 
> times faster.
> Force optimize was also tried to improve the performances, but it was very 
> slow, so we abandoned it.
> 
> Since we still don't have enterprise server ssd disks, we are now wondering 
> if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented across 
> different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this can 
> slow-down the solr response.
> 
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
> 
> Thank you.
> Danilo
> 
> Danilo Tomasoni
> 
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
> 
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to



Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 1:52 AM, Danilo Tomasoni wrote:

Hello all,
we are running a solr instance with around 41 MLN documents on a SATA class 10 
disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an 
average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr 
instance with the same data and the same number of segments was around 45 times 
faster.


What is your actual hardware and OS, as opposed to "normal pc"?

Dima


Query regarding integrating solr query functions into blockfacetjoin Query

2021-02-22 Thread Ravi Kumar
Hi Team,

I was implementing block join faceting query in my project and was stuck in
integrating the existing functional queries in the block join faceting
query.

*The current query using 'select' handler is as follows* :-
https://localhost:8983/solr/master_Product_default/*select*?*yq*
=_query_:%22\{\!multiMaxScore\+tie%3D0.0\}\(\(bomCode_bc_string\:samsung\)\+OR\+\(description_text_en\:samsung\)\+OR\+\(belleaprice_cad_bc846_string\:samsung\^20.0\)\+OR\+\(name_text_en\:samsung\^50.0\)\+OR\+\(category_string_mv\:samsung\^20.0\)\)\+OR\+\(\(belleaprice_cad_bc846_string\:samsung\~\^10.0\)\)\+OR\+\(\(bomCode_bc_string\:\%22samsung\%22\^50.0\)\+OR\+\(code_string\:\%22samsung\%22\~1.0\^90.0\)\+OR\+\(vendorId_string\:\%22samsung\%22\^95.0\)\+OR\+\(description_text_en\:\%22samsung\%22\^50.0\)\+OR\+\(belleaprice_cad_bc846_string\:\%22samsung\%22\^40.0\)\+OR\+\(name_text_en\:\%22samsung\%22\^100.0\)\+OR\+\(category_string_mv\:\%22samsung\%22\^40.0\)\+OR\+\(upcCode_bc846_string\:\%22samsung\%22\^99.0\)\)%22&
*yab*
=sum(product(and(not(exists(omniOnlineStockStatus_boolean)),exists(inStoreStockStatus_bc846_bellea_boolean)),70.0),product(and(exists(omniOnlineStockStatus_boolean),exists(inStoreStockStatus_bc846_bellea_boolean)),80.0),product(and(exists(omniOnlineStockStatus_boolean),not(exists(inStoreStockStatus_bc846_bellea_boolean))),40.0),product(exists(omniInStoreStockStatus_bc_boolean),20.0))&*q={!boost}(+{!lucene
v=$yq} {!func v=$yab})*
=(omniAssortment_bc846_boolean:true+OR+omniAssortment_a002_boolean:true)=(srpPriceValue_bc846_string:[0.0+TO+*])=(omniVisible_20_bellea_bc_boolean:true)=(catalogId:%22belleaProductCatalog%22+AND+catalogVersion:%22Online%22)=score+desc,omniInStoreStockStatus_bc_boolean+asc,creationtime_sortable_date+desc,inStoreStockStatus_bc846_bellea_boolean+asc,omniOnlineStockStatus_boolean+asc=0=2=characteristics_string=inStoreStockStatus_bc846_bellea_boolean=memorySize_string_mv=color_en_string=belleaprice_cad_bc846_string=supplier_string=model_string_mv=omniOnlineStockStatus_boolean=category_string_mv=omniInStoreStockStatus_bc_boolean=stockAvailability_string=true=count=1=11=score,*=[child+parentFilter%3D%22itemtype_string:Product%22+childFilter%3D%22brands_stringignorecase_mv:BC+AND+regions_stringignorecase_mv:ON+AND+activationTypes_stringignorecase_mv:N+AND+channels_stringignorecase_mv:NR+AND+banners_stringignorecase_mv:\%22Walmart\%22+AND+(accountTypes_stringignorecase_mv:IR+OR+accountTypes_stringignorecase_mv:empty)%22+limit%3D1000]=true=samsung=en=true

In the above query, the *'yq'* and* 'yab'* functions are integrated in the
main query using expression below :-
  *q={!boost}(+{!lucene v=$yq} {!func v=$yab})  *

I want to integrate the *'yq' and 'yab'* function queries in the *future
block join faceting query* mentioned below :-

https://localhost:8983/solr/master_Product_default/*blockJoinFacetRH*?
*q={!parent%20which=%22itemtype_string:Product%22}itemtype_string:TierPrice=json=true=true=contract_string=500*
=(omniAssortment_bc846_boolean:true+OR+omniAssortment_a002_boolean:true)=(srpPriceValue_bc846_string:[0.0+TO+*])=(omniVisible_20_bellea_bc_boolean:true)=(catalogId:%22belleaProductCatalog%22+AND+catalogVersion:%22Online%22)=score+desc,omniInStoreStockStatus_bc_boolean+asc,creationtime_sortable_date+desc,inStoreStockStatus_bc846_bellea_boolean+asc,omniOnlineStockStatus_boolean+asc=0=2000=characteristics_string=inStoreStockStatus_bc846_bellea_boolean=memorySize_string_mv=color_en_string=belleaprice_cad_bc846_string=supplier_string=model_string_mv=omniOnlineStockStatus_boolean=category_string_mv=omniInStoreStockStatus_bc_boolean=stockAvailability_string=true=count=1=11=score,*=[child+parentFilter%3D%22itemtype_string:Product%22+childFilter%3D%22brands_stringignorecase_mv:BC+AND+regions_stringignorecase_mv:ON+AND+activationTypes_stringignorecase_mv:N+AND+channels_stringignorecase_mv:NR+AND+banners_stringignorecase_mv:\%22Walmart\%22+AND+(accountTypes_stringignorecase_mv:IR+OR+accountTypes_stringignorecase_mv:empty)%22+limit%3D1000]=true=samsung=en=true

Can someone please suggest how can I add the expression '* {!boost}(+{!lucene
v=$yq} {!func v=$yab})*' functions in the block join facting query
-"*q={!parent%20which=%22itemtype_string:Product%22}
itemtype_string:TierPrice=json=true=true=contract_string=500*"
?

I shall be highly grateful if someone can suggest to me some insight.

Thanks & Regards,

Ravi Kumar
SAP Hybris Consultant


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Dario Rigolin
Hi Danilo, following my experience now SSD or RAM Disk is the only way to
speed up queries. It depends on your storage occupation of your 41M docs.
If you don't have Enterprise SSD you can add consumer SSD as a fast cache
(linux caching modules "flashcache / bcache" are able to use cheap SSD as a
data cache and have your data safe stored on SATA Disks).

I don't think you can increase performances without changing technology on
the storage system.

Regards.
Dario

Il giorno lun 22 feb 2021 alle ore 08:52 Danilo Tomasoni 
ha scritto:

> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA
> class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..)
> with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr
> instance with the same data and the same number of segments was around 45
> times faster.
> Force optimize was also tried to improve the performances, but it was very
> slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now
> wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented
> across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this
> can slow-down the solr response.
>
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
>
> Thank you.
> Danilo
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin