I am using JMX to monitor my replication status and am finding that my
MBeans are disappearing. I turned on debugging for JMX and found that
solr seems to be deleting the mbeans.
Is this a bug? Some trace info is below..
here's me reading the mbean successfully:
Jan 27, 2011 5:00:02 PM
Sorry to reply to myself, but I just wanted to see if anyone saw
this/had ideas why MBeans would be removed/re-added/removed.
I tried looking for this in the code but was unable to grok what
triggers bean removal.
Any hints?
On Thu, Jan 27, 2011 at 3:30 PM, matthew sporleder msporle
also old Solr version that this leaks a searcher (which
> means the index data is infinitely growing until you restart the server).
> You can also export from the database to Jsonline, post it to the json update
> handler together with the atomic processor.
>
> > Am 10.04.20
I have an field I would like to add to my schema which is stored in a
different database from my primary data. Can I use a separate entity
in my DIH to update a single field of my documents?
Thanks,
Matt
Is there a comprehensive/big set of tips for making solr into a
search-engine as a human would expect one to behave? I poked around
in the nutch github for a minute and found this:
https://github.com/apache/nutch/blob/9e5ae7366f7dd51eaa76e77bee6eb69f812bd29b/src/plugin/indexer-solr/schema.xml
in a couple
> of weeks
> https://opensourceconnections.com/blog/2020/05/05/tlre-solr-remote/ )
>
> Hope this helps,
>
> Cheers
>
> Charlie
>
> On 20/04/2020 23:43, matthew sporleder wrote:
> > Is there a comprehensive/big set of tips for
uld be “nothing comes before something"
> so sorting ascending on a keywordtokenizer+lowecasefilter
> should give you exactly what you’re asking for with no
> need for a length field.
>
> Best,
> Erick
>
> > On Mar 25, 2020, at 11:07 AM, matthew sporleder
> > wrote:
> &
Erick
>
> > On Mar 25, 2020, at 1:30 PM, matthew sporleder wrote:
> >
> > What_is_Lov_Holtz_known_for
> > What_is_lova_after_it_harddens
> > What_is_Lova_Moor's_birthday
> > What_is_lovable_in_Spanish
> > What_is_lovage
> > What_is_Lovagny's_population
> > What_is_lovan_for
> > What_is_lovanox
> > What_is_lovarstan_for
> > What_is_Lovasatin
>
Does anyone have good sources for word dictionaries to use for the
spell checker?
Thanks,
Matt
e a little space.
On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder wrote:
>
> Okay I appreciate you responding.
>
> Switching "slug" from "string_ci" class="solr.StrField" accomplished
> about the same results, which makes sens
ou’re really worried about space,
> that might not be an option.
>
> Best,
> Erick
>
> > On Mar 25, 2020, at 9:49 AM, matthew sporleder wrote:
> >
> > Where I landed:
> >
> > > sortMissingLast="t
;
> Best,
> Erick
>
> > On Mar 25, 2020, at 9:37 PM, matthew sporleder wrote:
> >
> > Okay confirmed-
> > I am getting a more predictable results set after adding an additional
> > field:
> > > sortMissingLast="true" omitNorms="true&qu
If it's solrcloud + zookeeper you can get most of the configs from the
"tree" browser on the console: /solr/#/~cloud?view=tree
You can otherwise derive a lot of the configs/schema/data-import
properties from the web console and api, neither of which require
server access.
It is also possible to
size-on-disk of cores, size of tlogs, DIH stats over time, last
modified date of cores
The most important alert-type things are -- collections in recovery or
down state, solrcloud election events, various error rates
It's also important to be able to tie these back to aliases so you are
only
hey are for people who use DIH :)
>
> Best regards,
> Radu
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder
> wrote:
&
the quick brown fox jumped over the sleeping dogI was just doing that
to troubleshoot/discover. I knew that you couldn't copy-to-copy but,
apparently, needed to be reminded.
My end goal (which I don't think I can achieve?) was to get my
everything field to contain something like:
everything: [
I highly recommend a version selector in the header! I am *always*
landing on 6.x docs from google.
On Tue, Apr 28, 2020 at 5:18 PM Cassandra Targett wrote:
>
> In case the list breaks the URL to view the Jenkins build, here's a shorter
> URL:
>
> https://s.apache.org/df7ew.
>
> On Tue, Apr 28,
Are you 100% sure it is using solrcloud and that the config is not
simply on the disk?
On Fri, Apr 24, 2020 at 7:11 AM Lewin Joy (TMNA) wrote:
>
> ll PROTECTED 関係者外秘
> Hi,
>
> We have an old collection running on a very old solr version. 5.3
> Now, we have a need to update the url string inside
FWIW I've had some luck with strategy 3 (increase zk timeout) when you
overwhelm the connection to zk or the disk on zk.
Is zk on the same boxes as solr?
On Tue, Apr 28, 2020 at 10:15 PM Sethuraman, Ganesh
wrote:
>
> Hi
>
> We are using SolrCloud 7.2.1 with 3 node Zookeeper ensemble. We have 92
What does the message look like, exactly, from solr.log ?
On Wed, Apr 29, 2020 at 1:27 PM Raji N wrote:
>
> Thank you for your reply. When OOM happens somehow it doesn't generate
> dump file. So we have hourly heaps running to diagnose this issue. Heap is
> around 700MB and threads around 150.
If you use the stemmer in your query analysis it should act the same, right?
On Thu, Apr 30, 2020 at 3:54 PM Erick Erickson wrote:
>
> They are being stemmed to two different tokens, “identif” and “identifi”.
> Stemming is algorithmic and imperfect and in this case you’re getting bitten
> by
fl=createdByMap:concat("createdBy.userName:
",createdBy.userName,",","createdBy.name: ",createdBy.name," ...)
On Thu, Apr 30, 2020 at 3:20 PM sambasivarao giddaluri
wrote:
>
> Hi Audrey,
>
> Yes i am aware of copyField but it does not fit in my use case. Reason is
> while giving as output we
meters it will certainly help
>
> Regards
> Ganesh
>
> -----Original Message-
> From: matthew sporleder
> Sent: Wednesday, April 29, 2020 11:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud degraded during backup and batch CSV update
>
> C
If the errors happen with garbage collection then potentially, yes.
You should never pause longer than your zk timeout (both sides).
On Thu, Apr 30, 2020 at 11:03 PM Ganesh Sethuraman
wrote:
>
> Any other JVM settings change possible?
>
> On Tue, Apr 28, 2020, 10:15 PM Sethuraman, Ganesh
>
Is what is shown in "analysis" the same as what is stored in a field?
I am confusing myself pretty thoroughly:
I have some fields:
And I have this:
I run this through the analyzer for stuff_stems:
Why so many shards?
> On May 10, 2020, at 9:09 PM, Ganesh Sethuraman
> wrote:
>
> We are using dedicated host, Cent OS in EC2 r5.12xlarge (48 CPU, ~360GB
> RAM), 2 nodes. Swapiness set to 1. With General purpose 2T EBS SSD volume.
> JVM size of 18gb, with G1 GC enabled. About 92 collection
I am attempting to use nested entities to populate documents from
different tables and verbose/debug output is showing repeated queries
on import. The doc number repeats the sqls.
"verbose-output":
[ "entity:parent",
..
[ "document#5", [
...
"entity:nested1", [
"query", "SELECT body AS nested1
On Thu, May 14, 2020 at 4:46 PM Shawn Heisey wrote:
>
> On 5/14/2020 9:36 AM, matthew sporleder wrote:
> > It appears that adding entities to my entities in my data import
> > config is slowing down my import process by a lot. Is there a good
> > way to speed
It appears that adding entities to my entities in my data import
config is slowing down my import process by a lot. Is there a good
way to speed this up? I see the ID's are individually queried instead
of using IN() or similar normal techniques to make things faster.
Just looking for some tips.
I think this is just an issue in the verbose/debug output. tcpdump
does not show the same issue.
On Wed, May 13, 2020 at 7:39 PM matthew sporleder wrote:
>
> I am attempting to use nested entities to populate documents from
> different tables and verbose/debug output is showing repeate
I have added an edge ngram field to my index and get decent results
with partial words but the results appear randomly sorted and all
contain the same score. Ideally I would like to sort by shortest
ngram match within my other qualifiers.
Is there a canonical solution to this?
Thanks,
Matt
You’ll need to copy to a field with keywordTokenizer
> and lowercaseFilter (string_ci? assuming it’s not really a :”string”) type.
>
> Best,
> Erick
>
> > On Mar 24, 2020, at 7:10 AM, matthew sporleder wrote:
> >
> > I have added an edge ngram field to my index
nking about changing
> your schema, it’ll answer a _lot_ of these questions immediately.
>
> Best,
> Erick
>
> > On Mar 24, 2020, at 8:37 AM, matthew sporleder wrote:
> >
> > Oh maybe a schema bug!
> >
> > my string_ci:
> > > sortMissingLast
I have quite a few numeric / meta-data type fields in my schema and
pretty much only use them in fq=, sort=, and friends. Should I always
use DocValue on these if i never plan to q=search: on them? Are there
any drawbacks?
Thanks,
Matt
arches. Indexed=true is for searching.
>
> sort
>
> > On May 19, 2020, at 4:00 PM, matthew sporleder wrote:
> >
> > I have quite a few numeric / meta-data type fields in my schema and
> > pretty much only use them in fq=, sort=, and friends. Should I always
> > use DocValue on these if i never plan to q=search: on them? Are there
> > any drawbacks?
> >
> > Thanks,
> > Matt
>
I saw
https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings
mentioned in a thread and was wondering if there was any downside to
running this potentially super useful log.
Thanks,
Matt
I have hit a bit of a cross-road with our usage of solr where I want
to include some slightly dynamic data.
I want to ask solr to find things like "text query" but only if they
meet some specific criteria. When I have all of those criteria
indexed, everything works great. (text contains
tion query. Since it’s non-indexed, you
> won’t be searching on it. That said, you can do a lot with function queries
> to satisfy use-cases.
>
> Best.
> Erick
>
> > On Sep 14, 2020, at 3:12 PM, matthew sporleder wrote:
> >
> > I have hit a bit of a cross-road wi
We are researching the canonical use case for external fields --
traffic-based rankings
What are the practical limits on the size of the external field file?
A k=v text file seems like it might fall over if it grows into the GB
range?
Our other thought is to use rolling cores where we stream in
>
> Best,
> Erick
>
> > On Sep 1, 2020, at 11:21 AM, matthew sporleder wrote:
> >
> > We are researching the canonical use case for external fields --
> > traffic-based rankings
> >
> > What are the practical limits on the size of the external field fi
You have a 12G heap for a 200MB index? Can you just try changing Xmx
to, like, 1g ?
On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb wrote:
>
> Hi,
>
> I'm involved in investigation of issue that involves huge GC overhead
> that happens during performance tests on Solr Nodes. Solr version is
> 6.1.
could only test prod
> or stage which are difficult to adjust.
>
> Is being stuck in GC common behaviour when the index is small compared
> to available heap during bigger load? I was more worried about the
> ratio of heap to total host memory.
>
> Regards,
> Karol
&g
Is there a friends-on-the-mailing list discount? I had a bit of sticker shock!
On Wed, Sep 16, 2020 at 9:38 AM Charlie Hull wrote:
>
> I do of course mean 'Group Discounts': you don't get a discount for
> being in a 'froup' sadly (I wasn't even aware that was a thing!)
>
> Charlie
>
> On
docs without starting fresh can
> have “interesting” results.
>
> Best,
> Erick
>
> > On Sep 14, 2020, at 5:16 PM, matthew sporleder wrote:
> >
> > Yes but "the _version_ field is also a non-indexed, non-stored single
> > valued docValues field;" <
complain to new relic on their lagging solr support!!! I have and
could use some support!
To address your actual question I have found JMX in solr to be crazy
unreliable but the admin/metrics web endpoint is pretty good.
I have some (crappy) python for parsing it for datadog:
I can index (without nested entities ofc ;) ) 100M records in about
6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql
so it is probably worth looking at why it is going slow before writing
your own indexer (which we are finally having to do)
On Fri, May 22, 2020 at 1:22 PM
Did you re-work your schema at all? There are new primitive types,
new lucene versions, DocValue's, etc
On Wed, Sep 16, 2020 at 12:40 PM Keene Chen wrote:
>
> Hi,
>
> Thanks for pointing that out. I've linked the images below:
>
> solr5_response_times.png
>
ent in terms of throughput. For more information on this,
> please start another thread in solr-users@ list, and more people can
> suggest best alternatives here.
>
>
> On Fri, Jul 17, 2020 at 5:50 AM matthew sporleder
> wrote:
>
> > Is there a replacement for DIH?
&g
I have a copyField:
But sometimes preview () is not populated.
It appears that the "catchall" field does not get created when preview
has no content in it. Can I use required=false or similar on a
copyField?
Thanks,
Matt
Nevermind I think we found this was caused by a bug in our (new) custom indexer
On Thu, Aug 6, 2020 at 4:11 PM matthew sporleder wrote:
>
> I have a copyField:
>
>
>
> But sometimes preview ( indexed="true" stored="true" multiValued="
I can already tell you it is EFS that is slow. I had to switch to an ebs disk
for backups on a different project because efs couldn't keep up.
> On Aug 10, 2020, at 9:43 PM, Ashwin Ramesh wrote:
>
> Hey Aroop, the general process for our backup is:
> - Connect all machines to an EFS drive
On Wed, Jun 24, 2020 at 9:46 AM Oakley, Craig (NIH/NLM/NCBI) [C]
wrote:
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes
FWIW -- zookeeper is pretty set-and-forget in my experience with
settings like autopurge.snapRetainCount, autopurge.purgeInterval, and
rotating the zookeeper.out stdout file.
It is a big hassle to setup the individual myid files and keep them in
sync with the server.$id=hostname in zoo.cfg but,
docker pull solr:8.4.1-slim
docker run -it --rm solr:8.4.1-slim /bin/bash
solr@223042112be5:/opt/solr-8.4.1$ find ./ -name "*jackson*"
./server/solr-webapp/webapp/WEB-INF/lib/jackson-core-2.10.0.jar
./server/solr-webapp/webapp/WEB-INF/lib/jackson-annotations-2.10.0.jar
On Tue, Jul 28, 2020 at 4:39 PM Odysci wrote:
>
> Folks,
>
> I suspect one of our Zookeeper installations on AWS was subject to a Meow
> attack (
> https://arstechnica.com/information-technology/2020/07/more-than-1000-databases-have-been-nuked-by-mystery-meow-attack/
> )
>
> Basically, the
FWIW the real error is msg":"SolrCore is loading which is bad if you are in the
middle of indexing
What is happening on solr at this time?
> On Jul 20, 2020, at 4:46 AM, Charlie Hull wrote:
>
> Hi Austin,
>
> Sitecore is a commercial product so your first port of call should be whoever
>
Is there a replacement for DIH?
On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
wrote:
>
> Dear Solr Users,
>
> In this release (Solr 8.6), we have deprecated the following:
>
> 1. Data Import Handler
>
> 2. HDFS support
>
> 3. Cross Data Center Replication (CDCR)
>
>
>
> All of
When this has happened to me before I have had pretty good luck by
restarting the overseer leader, which can be found in zookeeper under
/overseer_elect/leader
If that doesn't work I've had to do more intrusive and manual recovery
methods, which suck.
On Tue, Jan 12, 2021 at 10:36 AM Pierre
https://lucene.apache.org/solr/guide/8_1/collections-api.html#rename
On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz wrote:
>
> Hi again,
>
> Lets say I have a collection named A.
> I’m trying to rename it to A_1, then create an alias named A, which points to
> the A_1 collection.
> Is this possible
jetty supports http gzip and I've added it to solr before in my own
installs (and submitted patches to do so by default to solr) but I
don't know about the handling for solrj.
IME compression helps a little, sometimes a lot, and never hurts.
Even the admin interface benefits a lot from regular
, Dec 6, 2020 at 9:05 AM raj.yadav wrote:
>
> matthew sporleder wrote
> > Are you stuck in iowait during that commit?
>
> I am not sure how do I determine that, could you help me here.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Is zookeeper on the solr hosts or on its own? Have you tried
opensearcher=false (soft commit?)
On Sun, Dec 6, 2020 at 6:19 PM raj.yadav wrote:
>
> Hi Everyone,
>
>
> matthew sporleder wrote
> > Are you stuck in iowait during that commit?
>
> During commit operation, t
I would stick to soft commits and schedule hard-commits as
spaced-out-as-possible in regular maintenance windows until you can
find the culprit of the timeout.
This way you will have very focused windows for intense monitoring
during the hard-commit runs.
On Mon, Dec 7, 2020 at 9:24 AM
Is the normal/standard solution here to regex remove the '-'s and
combine them into a single token?
On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson wrote:
>
> This is a common point of confusion. There are two phases for creating a
> query,
> query _parsing_ first, then the analysis chain for
Are you stuck in iowait during that commit?
On Fri, Dec 4, 2020 at 6:28 AM raj.yadav wrote:
>
> Hi everyone,
>
> As per suggestions in previous post (by Erick and Shawn) we did following
> changes.
>
> OLD CACHE CONFIG
> size="32768"
> initialSize="6000"
>
https://solr.cool/#utilities -> https://github.com/rohitbemax/dataimporthandler
You can import it in the many new/novel ways to add things to a solr
install and it should work like always (apparently). The bottom of
that github page isn't hopeful however :)
On Sat, Nov 28, 2020 at 5:21 PM
iuk wrote:
>
> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>
> > ... The bottom of
> > that github page isn't hopeful however :)
>
> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
> JAR" :)
>
> It's a more general
Yesterday I realized that we have been carrying forward our configs
since, probably, 4.x days.
I ran a config set action=create (from _default) and saw files i
didn't recognize, and a lot *fewer* things than I've been uploading
for the last few years.
Anyway my new plan is to just use _default
ul, some default field params have changed…
>
> Best,
> Erick
>
> > On Nov 3, 2020, at 9:30 AM, matthew sporleder wrote:
> >
> > Yesterday I realized that we have been carrying forward our configs
> > since, probably, 4.x days.
> >
> > I ran a confi
Is there a more conservative starting point that is still up to date
than _default?
On Tue, Nov 3, 2020 at 11:13 AM matthew sporleder wrote:
>
> So _default considered unsafe? :)
>
> On Tue, Nov 3, 2020 at 11:08 AM Erick Erickson
> wrote:
> >
> > The caution I w
Is there a reason you can't use a bunch of solr versions and let beam users
choose at runtime?
> On Oct 30, 2020, at 4:58 AM, Piotr Szuberski
> wrote:
>
> Thank you very much for your answer!
>
> Beam has a compile time dependency on Solr so the user doesn't have to
> provide his own. The
Great updates. Thanks for keeping us all in the loop!
On Thu, Oct 22, 2020 at 7:43 PM Wei wrote:
>
> Hi Shawn,
>
> I.m circling back with some new findings with our 2 NUMA issue. After a
> few iterations, we do see improvement with the useNUMA flag and other JVM
> setting changes. Here are the
Did you commit?
> On Jan 9, 2021, at 5:44 AM, Flowerday, Matthew J
> wrote:
>
>
> Hi There
>
> As a test I stopped Solr and ran the IndexUpgrader tool on the database to
> see if this might fix the issue. It completed OK but unfortunately the issue
> still occurs – a new version of the
I think the general advice is to do a full re-index on a major version
upgrade. Also - did you ever commit?
On Sun, Jan 10, 2021 at 11:13 AM Flowerday, Matthew J <
matthew.flower...@gb.unisys.com> wrote:
> Hi There
>
>
>
> Thanks for contacting me.
>
>
>
> I carried out this analysis of the
IRC has kind of died off,
https://lucene.apache.org/solr/community.html has a slack mentioned,
I'm on https://opensourceconnections.com/slack after taking their solr
training class and assume it's mostly open to solr community.
On Fri, Jan 15, 2021 at 8:10 PM Justin Sweeney
wrote:
>
> Hi all,
>
https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
?
On Tue, Jan 19, 2021 at 4:01 AM mosheB wrote:
>
> Hi, is there any sophisticated way [using the schema] to block brutal regex
> queries?
>
>
> Thanks
>
>
>
> --
> Sent from:
I've run into this (or similar) issues in the past (solr6? I don't
remember exactly) where tlogs get stuck either growing indefinitely
and/or refusing to commit on restart.
What I ended up doing was writing a monitor to check for the number of
tlogs and alert if they got over some limit (100 or
77 matches
Mail list logo