[jira] [Created] (SLING-12236) Introduce config option to bypass oak's DataStore deduplication for job properties

2024-01-22 Thread Stefan Egli (Jira)
Stefan Egli created SLING-12236:
---

 Summary: Introduce config option to bypass oak's DataStore 
deduplication for job properties
 Key: SLING-12236
 URL: https://issues.apache.org/jira/browse/SLING-12236
 Project: Sling
  Issue Type: Task
  Components: Event
Reporter: Stefan Egli


When a sling job is created, its properties are persisted using 
ResourceHelper.getOrCreateResource. Typically the property values would be 
primitive types or short Strings and thus be embedded. For larger property 
values they might be stored as binaries by the underlying DataStore. If for 
some reason different jobs contain identical property values (i.e. binaries), 
then they are deduplicated by DataStore. If such identical binaries are 
concurrently read and written by different Sling instances (as could happen if 
the job queue is not ORDERED and if identical property binaries are in play in 
the first place), then DataStore could run into concurrency issues with 
reading/writing the same binary. That could manifest in sling job eg as a 
ClassNotFoundException.

This situation could either be avoided by the application ensuring not to have 
such duplicate job binaries. 

Alternatively sling job could consider introducing a job queue configuration 
that would artificially make binaries unique (by eg prepending a hidden UUID).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11900) Provide alternative terminology for inequitable terms

2023-10-16 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11900.
---

> Provide alternative terminology for inequitable terms
> -
>
> Key: SLING-11900
> URL: https://issues.apache.org/jira/browse/SLING-11900
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Carsten Ziegeler
>Assignee: Carsten Ziegeler
>Priority: Major
> Fix For: Event 4.3.14
>
>
> The configuration for the jobs is using white/black list which is considered 
> inequitable terminology. Therefore, some more acceptable equivalents should 
> be provided for these terms. The proposal is to switch to allow/deny list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11923) Sling Events does not Build on Java 17

2023-10-16 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11923.
---

> Sling Events does not Build on Java 17
> --
>
> Key: SLING-11923
> URL: https://issues.apache.org/jira/browse/SLING-11923
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Dan Klco
>Assignee: Rishabh Daim
>Priority: Major
> Fix For: Event 4.3.14
>
>
> Attempting to build Sling Events with Java 17 fails with:
> {code:java}
> [main] INFO org.apache.jackrabbit.oak.plugins.index.IndexUpdate - Reindexing 
> completed
> [ERROR] Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 1.935 
> s <<< FAILURE! - in org.apache.sling.event.impl.jobs.queues.TestTopicHalting
> [ERROR] 
> org.apache.sling.event.impl.jobs.queues.TestTopicHalting.testUnhalting  Time 
> elapsed: 1.506 s  <<< ERROR!
> java.lang.NoClassDefFoundError: java/security/acl/Group
>   at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>   at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
>   at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>  
> {code}
> This class is deprecated for removal in Java 11: 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/security/acl/Group.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11918) GaugeSupport has infinite recursion in registerWithSuffix

2023-10-16 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11918.
---

> GaugeSupport has infinite recursion in registerWithSuffix
> -
>
> Key: SLING-11918
> URL: https://issues.apache.org/jira/browse/SLING-11918
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.8
>Reporter: Patrique Legault
>Priority: Critical
> Fix For: Event 4.3.14
>
>
> This exception occurs on a system with an unknown but particular 
> configuration but none the less causes the system to become unusable.
>  
> {code:java}
> (java.lang.StackOverflowError: Delayed StackOverflowError due to  
> ReservedStackAccess annotated method)
>     at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1239)
>     at 
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
>     at 
> java.management/com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:415)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1855)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:955)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:890)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:320)
>     at 
> java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
>     at 
> com.codahale.metrics.JmxReporter$JmxListener.registerMBean(JmxReporter.java:510)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.JmxReporter$JmxListener.onGaugeAdded(JmxReporter.java:535)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.MetricRegistry.notifyListenerOfAddedMetric(MetricRegistry.java:454)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.MetricRegistry.onMetricAdded(MetricRegistry.java:448) 
> [io.dropwizard.metrics.core:3.2.4]
>     at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:89) 
> [io.dropwizard.metrics.core:3.2.4]
>     at 
> org.apache.sling.event.impl.jobs.stats.GaugeSupport.registerWithSuffix(GaugeSupport.java:150)
>  [org.apache.sling.event:4.3.8]
>     at 
> org.apache.sling.event.impl.jobs.stats.GaugeSupport.registerWithSuffix(GaugeSupport.java:154)
>  [org.apache.sling.event:4.3.8] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11422) Stop embedding the event.api package in the event bundle

2023-10-09 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11422:

Fix Version/s: Event 4.3.16
   (was: Event 4.3.14)

> Stop embedding the event.api package in the event bundle
> 
>
> Key: SLING-11422
> URL: https://issues.apache.org/jira/browse/SLING-11422
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Robert Munteanu
>Priority: Major
> Fix For: Event 4.3.16
>
>
> As discussed in SLING-9664, deploying the Sling Event and Event API bundles 
> separately would be more in line with how we deploy bundles and also fix the 
> Javadoc generation.
> We should make this a minor version bump for the event bundle, to make it 
> clear that deployers need to adapt. Probably the baselining mechanism will 
> complain, but it's something we can ignore for the release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2023-10-09 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9664:
---
Fix Version/s: Event 4.3.16
   (was: Event 4.3.14)

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.16
>
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-12078:

Description: 
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93]
 and 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/dea04990b770a92f29c2504aa33d8158d68da58f]
 commit

  was:
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93|
 and 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/dea04990b770a92f29c2504aa33d8158d68da58f]
 commit


> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned 

[jira] [Updated] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-12078:

Description: 
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93|
 and 
[this|https://github.com/apache/sling-org-apache-sling-event/commit/dea04990b770a92f29c2504aa33d8158d68da58f]
 commit

  was:
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 
https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93


> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned subtree, which waits until the next 
> TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - 
> which then finds the 

[jira] [Commented] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772259#comment-17772259
 ] 

Stefan Egli commented on SLING-12078:
-

* additional IT test affected, added [same workaround 
there|https://github.com/apache/sling-org-apache-sling-event/commit/dea04990b770a92f29c2504aa33d8158d68da58f]

> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned subtree, which waits until the next 
> TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - 
> which then finds the unassigned job and re/assigns it accordingly. So the job 
> is never lost, but substantially delayed due to this. (the frequency of 
> TopologyEvents depends on actual cluster/property changes happening in the 
> topology and can thus vary).
> Tasks:
> * provide a test case to reproduce
> * fix the race-condition
> * undo 
> [this|https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93]
>  and 
> [this|https://github.com/apache/sling-org-apache-sling-event/commit/dea04990b770a92f29c2504aa33d8158d68da58f]
>  commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-12078:

Description: 
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 
https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93

  was:
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 


> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned subtree, which waits until the next 
> TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - 
> which then finds the unassigned job and re/assigns it accordingly. So the job 
> is never lost, but substantially delayed due to this. (the frequency of 
> TopologyEvents depends on actual cluster/property changes happening in the 
> topology and can thus 

[jira] [Updated] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-12078:

Description: 
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary).

Tasks:
* provide a test case to reproduce
* fix the race-condition
* undo 

  was:
Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary)


> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned subtree, which waits until the next 
> TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - 
> which then finds the unassigned job and re/assigns it accordingly. So the job 
> is never lost, but substantially delayed due to this. (the frequency of 
> TopologyEvents depends on actual cluster/property changes happening in the 
> topology and can thus vary).
> Tasks:
> * provide a test case to reproduce
> * fix the race-condition
> * undo 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772242#comment-17772242
 ] 

Stefan Egli commented on SLING-12078:
-

* added a [workaround 
attempt|https://github.com/apache/sling-org-apache-sling-event/commit/d16686705908099b26d0a3233f61c4e209880f93]
 for 2 in/frequently failing tests - that must be reverted as part of this 
ticket, once the race-condition is confirmed/fixed

> Suspected race condition between TOPOLOGY_INIT and JobManager.addJob
> 
>
> Key: SLING-12078
> URL: https://issues.apache.org/jira/browse/SLING-12078
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Stefan Egli
>Priority: Major
>
> Two regular cases where a job is stored as part of JobManager.addJob():
>  * when a topology is defined, it directly gets stored to the appropriate 
> assigned/target slingId subtree. This is the most frequent case by far.
>  * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put 
> into the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
> CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
> corresponding assigned subtree.
> There is a suspect race condition (test case to be provided), which happens 
> between the thread doing JobManager.addJob() and the thread handling the 
> TOPOLOGY_INIT:
>  * JobManager.addJob determines the target slingId - which is not yet 
> defined, as TOPOLOGY_INIT is just being handled concurrently
>  * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however 
> does not yet find the above new job in unassigned, as the job is just being 
> stored concurrently.
> The result is a job in the unassigned subtree, which waits until the next 
> TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - 
> which then finds the unassigned job and re/assigns it accordingly. So the job 
> is never lost, but substantially delayed due to this. (the frequency of 
> TopologyEvents depends on actual cluster/property changes happening in the 
> topology and can thus vary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-12078) Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

2023-10-05 Thread Stefan Egli (Jira)
Stefan Egli created SLING-12078:
---

 Summary: Suspected race condition between TOPOLOGY_INIT and 
JobManager.addJob
 Key: SLING-12078
 URL: https://issues.apache.org/jira/browse/SLING-12078
 Project: Sling
  Issue Type: Bug
  Components: Event
Affects Versions: Event 4.3.12
Reporter: Stefan Egli


Two regular cases where a job is stored as part of JobManager.addJob():
 * when a topology is defined, it directly gets stored to the appropriate 
assigned/target slingId subtree. This is the most frequent case by far.
 * if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into 
the unassigned subtree. Later upon receiving TOPOLOGY_INIT 
CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the 
corresponding assigned subtree.

There is a suspect race condition (test case to be provided), which happens 
between the thread doing JobManager.addJob() and the thread handling the 
TOPOLOGY_INIT:
 * JobManager.addJob determines the target slingId - which is not yet defined, 
as TOPOLOGY_INIT is just being handled concurrently
 * CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does 
not yet find the above new job in unassigned, as the job is just being stored 
concurrently.

The result is a job in the unassigned subtree, which waits until the next 
TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which 
then finds the unassigned job and re/assigns it accordingly. So the job is 
never lost, but substantially delayed due to this. (the frequency of 
TopologyEvents depends on actual cluster/property changes happening in the 
topology and can thus vary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2023-10-02 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771066#comment-17771066
 ] 

Stefan Egli commented on SLING-11662:
-

[~cziegeler], thx for reactivating this - got lost in the noise indeed. I'll 
have a look at the PR.

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11894) jcr-contentloader: Fix paxexam Integration Tests with Java 17

2023-08-17 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11894.
-
Resolution: Fixed

> jcr-contentloader: Fix paxexam Integration Tests with Java 17
> -
>
> Key: SLING-11894
> URL: https://issues.apache.org/jira/browse/SLING-11894
> Project: Sling
>  Issue Type: Bug
>  Components: JCR
>Reporter: Stefan Seifert
>Assignee: Rishabh Daim
>Priority: Major
> Fix For: JCR ContentLoader 2.6.2
>
>
> currently, the integration tests for JCR contentloader are failing on both 
> linux and windows when running with Java 17.
> all ITs are failing with an error like this:
> {noformat}
> [INFO] Running org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 64.826 s <<< FAILURE! - in 
> org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT
> [ERROR] 
> org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT.initialContentInstalled
>   Time elapsed: 11.066 s  <<< ERROR!
> org.ops4j.pax.swissbox.tracker.ServiceLookupException: gave up waiting for 
> service org.apache.sling.resource.presence.ResourcePresence
> at 
> org.ops4j.pax.swissbox.tracker.ServiceLookup.getService(ServiceLookup.java:199)
> at 
> org.ops4j.pax.swissbox.tracker.ServiceLookup.getService(ServiceLookup.java:136)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectField(ServiceInjector.java:89)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectDeclaredFields(ServiceInjector.java:69)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectFields(ServiceInjector.java:61)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.createTest(ContainerTestRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:266)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:263)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChildWithRetry(ContainerTestRunner.java:84)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:75)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:43)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:124)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.findAndInvoke(JUnitProbeInvoker.java:97)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.call(JUnitProbeInvoker.java:73)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> org.ops4j.pax.swissbox.framework.RemoteFrameworkImpl.invokeMethodOnService(RemoteFrameworkImpl.java:435)
> at 
> org.ops4j.pax.swissbox.framework.RemoteFrameworkImpl.invokeMethodOnService(RemoteFrameworkImpl.java:408)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:360)
> at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
> at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
> at 
> 

[jira] [Assigned] (SLING-11894) jcr-contentloader: Fix paxexam Integration Tests with Java 17

2023-08-17 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli reassigned SLING-11894:
---

Assignee: Rishabh Daim

> jcr-contentloader: Fix paxexam Integration Tests with Java 17
> -
>
> Key: SLING-11894
> URL: https://issues.apache.org/jira/browse/SLING-11894
> Project: Sling
>  Issue Type: Bug
>  Components: JCR
>Reporter: Stefan Seifert
>Assignee: Rishabh Daim
>Priority: Major
> Fix For: JCR ContentLoader 2.6.2
>
>
> currently, the integration tests for JCR contentloader are failing on both 
> linux and windows when running with Java 17.
> all ITs are failing with an error like this:
> {noformat}
> [INFO] Running org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 64.826 s <<< FAILURE! - in 
> org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT
> [ERROR] 
> org.apache.sling.jcr.contentloader.it.OrderedInitialContentIT.initialContentInstalled
>   Time elapsed: 11.066 s  <<< ERROR!
> org.ops4j.pax.swissbox.tracker.ServiceLookupException: gave up waiting for 
> service org.apache.sling.resource.presence.ResourcePresence
> at 
> org.ops4j.pax.swissbox.tracker.ServiceLookup.getService(ServiceLookup.java:199)
> at 
> org.ops4j.pax.swissbox.tracker.ServiceLookup.getService(ServiceLookup.java:136)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectField(ServiceInjector.java:89)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectDeclaredFields(ServiceInjector.java:69)
> at 
> org.ops4j.pax.exam.inject.internal.ServiceInjector.injectFields(ServiceInjector.java:61)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.createTest(ContainerTestRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:266)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:263)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChildWithRetry(ContainerTestRunner.java:84)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:75)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.ContainerTestRunner.runChild(ContainerTestRunner.java:43)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.invokeViaJUnit(JUnitProbeInvoker.java:124)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.findAndInvoke(JUnitProbeInvoker.java:97)
> at 
> org.ops4j.pax.exam.invoker.junit.internal.JUnitProbeInvoker.call(JUnitProbeInvoker.java:73)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> org.ops4j.pax.swissbox.framework.RemoteFrameworkImpl.invokeMethodOnService(RemoteFrameworkImpl.java:435)
> at 
> org.ops4j.pax.swissbox.framework.RemoteFrameworkImpl.invokeMethodOnService(RemoteFrameworkImpl.java:408)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:360)
> at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
> at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
> at 
> 

[jira] [Commented] (SLING-11923) Sling Events does not Build on Java 17

2023-07-18 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744276#comment-17744276
 ] 

Stefan Egli commented on SLING-11923:
-

FYI: merged https://github.com/apache/sling-org-apache-sling-event/pull/32

> Sling Events does not Build on Java 17
> --
>
> Key: SLING-11923
> URL: https://issues.apache.org/jira/browse/SLING-11923
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Dan Klco
>Assignee: Rishabh Daim
>Priority: Major
> Fix For: Event 4.3.14
>
>
> Attempting to build Sling Events with Java 17 fails with:
> {code:java}
> [main] INFO org.apache.jackrabbit.oak.plugins.index.IndexUpdate - Reindexing 
> completed
> [ERROR] Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 1.935 
> s <<< FAILURE! - in org.apache.sling.event.impl.jobs.queues.TestTopicHalting
> [ERROR] 
> org.apache.sling.event.impl.jobs.queues.TestTopicHalting.testUnhalting  Time 
> elapsed: 1.506 s  <<< ERROR!
> java.lang.NoClassDefFoundError: java/security/acl/Group
>   at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>   at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
>   at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>  
> {code}
> This class is deprecated for removal in Java 11: 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/security/acl/Group.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (SLING-11923) Sling Events does not work on Java 17

2023-07-13 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli reassigned SLING-11923:
---

Assignee: Rishabh Daim

> Sling Events does not work on Java 17
> -
>
> Key: SLING-11923
> URL: https://issues.apache.org/jira/browse/SLING-11923
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.12
>Reporter: Dan Klco
>Assignee: Rishabh Daim
>Priority: Major
>
> Attempting to build Sling Events with Java 17 fails with:
> {code:java}
> [main] INFO org.apache.jackrabbit.oak.plugins.index.IndexUpdate - Reindexing 
> completed
> [ERROR] Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 1.935 
> s <<< FAILURE! - in org.apache.sling.event.impl.jobs.queues.TestTopicHalting
> [ERROR] 
> org.apache.sling.event.impl.jobs.queues.TestTopicHalting.testUnhalting  Time 
> elapsed: 1.506 s  <<< ERROR!
> java.lang.NoClassDefFoundError: java/security/acl/Group
>   at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>   at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
>   at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>  
> {code}
> This class is deprecated for removal in Java 11: 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/security/acl/Group.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11918) GaugeSupport has infinite recursion in registerWithSuffix

2023-07-11 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11918.
-
Resolution: Fixed

Merged [PR|https://github.com/apache/sling-org-apache-sling-event/pull/31], thx 
[~patlego] !

> GaugeSupport has infinite recursion in registerWithSuffix
> -
>
> Key: SLING-11918
> URL: https://issues.apache.org/jira/browse/SLING-11918
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.8
>Reporter: Patrique Legault
>Priority: Critical
> Fix For: Event 4.3.14
>
>
> This exception occurs on a system with an unknown but particular 
> configuration but none the less causes the system to become unusable.
>  
> {code:java}
> (java.lang.StackOverflowError: Delayed StackOverflowError due to  
> ReservedStackAccess annotated method)
>     at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1239)
>     at 
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
>     at 
> java.management/com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:415)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1855)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:955)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:890)
>     at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:320)
>     at 
> java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
>     at 
> com.codahale.metrics.JmxReporter$JmxListener.registerMBean(JmxReporter.java:510)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.JmxReporter$JmxListener.onGaugeAdded(JmxReporter.java:535)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.MetricRegistry.notifyListenerOfAddedMetric(MetricRegistry.java:454)
>  [io.dropwizard.metrics.core:3.2.4]
>     at 
> com.codahale.metrics.MetricRegistry.onMetricAdded(MetricRegistry.java:448) 
> [io.dropwizard.metrics.core:3.2.4]
>     at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:89) 
> [io.dropwizard.metrics.core:3.2.4]
>     at 
> org.apache.sling.event.impl.jobs.stats.GaugeSupport.registerWithSuffix(GaugeSupport.java:150)
>  [org.apache.sling.event:4.3.8]
>     at 
> org.apache.sling.event.impl.jobs.stats.GaugeSupport.registerWithSuffix(GaugeSupport.java:154)
>  [org.apache.sling.event:4.3.8] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11901) Extend job metrics

2023-06-05 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11901:
---

 Summary: Extend job metrics
 Key: SLING-11901
 URL: https://issues.apache.org/jira/browse/SLING-11901
 Project: Sling
  Issue Type: Task
  Components: Event
Affects Versions: Event 4.3.12
Reporter: Stefan Egli


Below is a list of additional metrics to add to sling.event on top of what 
SLING-8665 already added earlier:

* a gauge for number of configured queues
* a gauge for number of queues that have currently queued jobs
* a gauge for number of queues that have currently running jobs
* a per-queue histogram of waiting time of jobs (current "averageWaitingTime" 
is total average only)
* a per-queue histogram of durations of ongoing jobs
* a per-queue histogram of durations of finished jobs (current 
"averageProcessingTime" is total average only)
* a per-queue gauge for number of job retries (as far as available)

Below metric is potentially controversial, as it is implementation specific:
* a per-queue gauge for number of job reassignments




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11797) Log Jobs Added with No Assigned Topology Capability at Info

2023-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11797.
---

> Log Jobs Added with No Assigned Topology Capability at Info
> ---
>
> Key: SLING-11797
> URL: https://issues.apache.org/jira/browse/SLING-11797
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.6
>Reporter: Dan Klco
>Assignee: Dan Klco
>Priority: Minor
> Fix For: Event 4.3.8
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When creating a job where the topology does not provide a capability for the 
> topic, the JobManagerImpl logs the following message at the DEBUG level:
> {quote}Persisting job {} into queue {}{quote}
>  
> This makes it challenging to identify/diagnose issues with jobs not being 
> assigned as: * It requires enabling debug logging on the JobManagerImpl which 
> can be quite verbose, especially under load
>  * Since most production instances do not run with DEBUG, these situations 
> will not be available in logs
>  * The log message does not indicate that this job will not be immediately 
> assigned to be processed
> Instead, the JobManagerImpl should log a message at least at INFO level which 
> indicates that the Job being persisted does not have an assigned target.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11793) Limit log messages via JobExecutionContext.log()

2023-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11793.
---

> Limit log messages via JobExecutionContext.log()
> 
>
> Key: SLING-11793
> URL: https://issues.apache.org/jira/browse/SLING-11793
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Rishabh Kumar
>Priority: Major
> Fix For: Event 4.3.8
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently, every log message passed via JobExecutionContext.log() is appended 
> to previous messages and then stored in the repository. This can bloat the 
> repository and is discouraged as described in JavaDoc:
> {quote}A job consumer can use this method during job processing to add 
> additional information about the current state of job processing. As calling 
> this method adds a significant overhead it should only be used to log a few 
> statements per job processing. If a consumer wants to output detailed 
> information about the processing it should persists it by itself and not use 
> this method for it. The message and the arguments are passed to the 
> MessageFormat class.{quote}
> Some job implementations ignore this advice and still log potentially many 
> messages during execution.
> {color:#172b4d}The Sling Job implementation should ignore further log 
> messages when a threshold is reached. This may be configurable to make it 
> backward compatible{color}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11805) Don't stop slingId cleanup upon PROPERTIES_CHANGED

2023-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11805.
---

> Don't stop slingId cleanup upon PROPERTIES_CHANGED
> --
>
> Key: SLING-11805
> URL: https://issues.apache.org/jira/browse/SLING-11805
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a follow-up to SLING-10854 where the SlingIdCleanupTask was introduced. 
> The current implementation stops cleanup when it received a 
> PROPERTIES_CHANGED event. This is actually wrong. It should continue. The way 
> it is currently done has the effect that cleanup is only triggered upon a 
> TOPOLOGY_INIT or TOPOLOGY_CHANGED without a following PROPERTIES_CHANGED. 
> This current behaviour reduces the chances of the cleanup running - having 
> said that, the likelyhood of the cleanup eventually running is still very 
> high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2023-04-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-10854.
---

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11422) Stop embedding the event.api package in the event bundle

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11422:

Fix Version/s: Event 4.3.10
   (was: Event 4.3.8)

> Stop embedding the event.api package in the event bundle
> 
>
> Key: SLING-11422
> URL: https://issues.apache.org/jira/browse/SLING-11422
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Robert Munteanu
>Priority: Major
> Fix For: Event 4.3.10
>
>
> As discussed in SLING-9664, deploying the Sling Event and Event API bundles 
> separately would be more in line with how we deploy bundles and also fix the 
> Javadoc generation.
> We should make this a minor version bump for the event bundle, to make it 
> clear that deployers need to adapt. Probably the baselining mechanism will 
> complain, but it's something we can ignore for the release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9664:
---
Fix Version/s: Event 4.3.10
   (was: Event 4.3.8)

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.10
>
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11805) Don't stop slingId cleanup upon PROPERTIES_CHANGED

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11805.
-
Resolution: Fixed

> Don't stop slingId cleanup upon PROPERTIES_CHANGED
> --
>
> Key: SLING-11805
> URL: https://issues.apache.org/jira/browse/SLING-11805
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a follow-up to SLING-10854 where the SlingIdCleanupTask was introduced. 
> The current implementation stops cleanup when it received a 
> PROPERTIES_CHANGED event. This is actually wrong. It should continue. The way 
> it is currently done has the effect that cleanup is only triggered upon a 
> TOPOLOGY_INIT or TOPOLOGY_CHANGED without a following PROPERTIES_CHANGED. 
> This current behaviour reduces the chances of the cleanup running - having 
> said that, the likelyhood of the cleanup eventually running is still very 
> high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9625) DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9625:
---
Fix Version/s: Discovery Oak 1.2.46
   (was: Discovery Oak 1.2.44)

> DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException 
> -
>
> Key: SLING-9625
> URL: https://issues.apache.org/jira/browse/SLING-9625
> Project: Sling
>  Issue Type: Improvement
>Affects Versions: Discovery Oak 1.2.30
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.46
>
>
> While stopping the OSGi container (Sling Starter 12 SNAPSHOT) I observed the 
> following error
> {code}
> 03.08.2020 10:30:06.262 *INFO * [Apache Sling Terminator] Stopping Apache 
> Sling
> ERROR: bundle org.apache.sling.discovery.oak:1.2.28 
> (139)[org.apache.sling.discovery.oak.OakDiscoveryService(200)] : The 
> updatedPropertyProvider method has thrown an exception
> java.lang.RuntimeException: Could not log in to repository 
> (org.apache.sling.api.resource.LoginException: Cannot derive user name for 
> bundle org.apache.sling.discovery.oak [139] and sub service null)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.doUpdateProperties(OakDiscoveryService.java:540)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.bindPropertyProviderInteral(OakDiscoveryService.java:406)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.updatedPropertyProvider(OakDiscoveryService.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:242)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:678)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$NotResolved.invoke(BaseMethod.java:633)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:524)
>   at 
> org.apache.felix.scr.impl.inject.methods.BindMethod.invoke(BindMethod.java:42)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUpdatedMethod(DependencyManager.java:1934)
>   at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUpdatedMethod(SingleComponentManager.java:448)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:366)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:297)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1229)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1137)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.track(ServiceTracker.java:883)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1168)
>   at 
> org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:125)
>   at 
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>   at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>   at 
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>   at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833)
>   at org.apache.felix.framework.Felix.access$000(Felix.java:112)
>   at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:434)
>   at 
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:601)
>   at 
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:132)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindService(JobConsumerManager.java:354)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindJobExecutor(JobConsumerManager.java:270)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (SLING-10813) Improve ViewStateManagerImpl.waitForAsyncEvents, also speeds up tests

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10813:

Fix Version/s: Discovery Oak 1.2.46
   (was: Discovery Oak 1.2.44)

> Improve ViewStateManagerImpl.waitForAsyncEvents, also speeds up tests
> -
>
> Key: SLING-10813
> URL: https://issues.apache.org/jira/browse/SLING-10813
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Reporter: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.46
>
>
> As discussed [in this 
> PR|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/4#discussion_r708292265]
>  the ViewStateManagerImpl.waitForAsyncEvents returning currently requires a 
> {{Thread.sleep()}} to ensure anything that was "just triggered" has finished 
> executing asynchronously.
> This should be improved in this waitForAsyncEvent method, by being more 
> precise about when it returns (ie include any call to 
> {{asyncEvent.trigger()}} having terminated)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-5598) Exclude slow tests by default with assume(sling.slow.tests.enabled)

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-5598:
---
Fix Version/s: Discovery Oak 1.2.46
   (was: Discovery Oak 1.2.44)

> Exclude slow tests by default with assume(sling.slow.tests.enabled) 
> 
>
> Key: SLING-5598
> URL: https://issues.apache.org/jira/browse/SLING-5598
> Project: Sling
>  Issue Type: Task
>  Components: Extensions
>Affects Versions: Discovery Impl 1.2.6, Discovery Base 1.1.2, Discovery 
> Commons 1.0.10, Discovery Oak 1.2.6
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Impl 1.2.14, Discovery Base 2.0.16, Discovery 
> Commons 1.0.30, Discovery Oak 1.2.46
>
> Attachments: SLING-5598-commons-testing.patch, 
> SLING-5598-discovery.patch
>
>
> As suggested by [~bdelacretaz] on [the 
> list|http://markmail.org/message/yad5awqg53epk3ck] we should improve test 
> duration (ideally 1-2min per bundle max, 10-15min overall). While they are 
> not yet improved however, slow tests should be excluded by default and run 
> only if enabled explicitly. Here's an example {{@Before}} method to achieve 
> that:
> {noformat}
> @Before
> public void checkSlowTests() {
> assumeNotNull(System.getProperty("sling.slow.tests.enabled"));
> }
> {noformat}
> and to enable the slow tests you do: {{mvn -Dsling.slow.tests.enabled=true 
> clean test}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10008) Add null annotations to package org.apache.sling.discovery (Discovery API)

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10008:

Fix Version/s: Discovery Oak 1.2.46
   (was: Discovery Oak 1.2.44)

> Add null annotations to package org.apache.sling.discovery (Discovery API)
> --
>
> Key: SLING-10008
> URL: https://issues.apache.org/jira/browse/SLING-10008
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.46
>
>
> In https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2492 
> and https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2498 
> there were potential NPEs uncovered. To prevent consumers from running into 
> those the Null annotations 
> (https://sling.apache.org/documentation/development/null-analysis.html) 
> should be added to the relevant classes there as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11619) Restore safeguard mechanism for discovery config's int and long properties

2023-03-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11619.
---

> Restore safeguard mechanism for discovery config's int and long properties
> --
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.42
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11805) Don't stop slingId cleanup upon PROPERTIES_CHANGED

2023-03-16 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701206#comment-17701206
 ] 

Stefan Egli commented on SLING-11805:
-

* fix pushed
* PR ready for review : 
https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/14

> Don't stop slingId cleanup upon PROPERTIES_CHANGED
> --
>
> Key: SLING-11805
> URL: https://issues.apache.org/jira/browse/SLING-11805
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As a follow-up to SLING-10854 where the SlingIdCleanupTask was introduced. 
> The current implementation stops cleanup when it received a 
> PROPERTIES_CHANGED event. This is actually wrong. It should continue. The way 
> it is currently done has the effect that cleanup is only triggered upon a 
> TOPOLOGY_INIT or TOPOLOGY_CHANGED without a following PROPERTIES_CHANGED. 
> This current behaviour reduces the chances of the cleanup running - having 
> said that, the likelyhood of the cleanup eventually running is still very 
> high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11805) Don't stop slingId cleanup upon PROPERTIES_CHANGED

2023-03-16 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701093#comment-17701093
 ] 

Stefan Egli commented on SLING-11805:
-

* test added that reproduces this - details in draft PR 
https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/14
* next step is to fix the code 

> Don't stop slingId cleanup upon PROPERTIES_CHANGED
> --
>
> Key: SLING-11805
> URL: https://issues.apache.org/jira/browse/SLING-11805
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>
> As a follow-up to SLING-10854 where the SlingIdCleanupTask was introduced. 
> The current implementation stops cleanup when it received a 
> PROPERTIES_CHANGED event. This is actually wrong. It should continue. The way 
> it is currently done has the effect that cleanup is only triggered upon a 
> TOPOLOGY_INIT or TOPOLOGY_CHANGED without a following PROPERTIES_CHANGED. 
> This current behaviour reduces the chances of the cleanup running - having 
> said that, the likelyhood of the cleanup eventually running is still very 
> high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11805) Don't stop slingId cleanup upon PROPERTIES_CHANGED

2023-03-16 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11805:
---

 Summary: Don't stop slingId cleanup upon PROPERTIES_CHANGED
 Key: SLING-11805
 URL: https://issues.apache.org/jira/browse/SLING-11805
 Project: Sling
  Issue Type: Improvement
  Components: Discovery
Affects Versions: Discovery Oak 1.2.40
Reporter: Stefan Egli
Assignee: Stefan Egli
 Fix For: Discovery Oak 1.2.44


As a follow-up to SLING-10854 where the SlingIdCleanupTask was introduced. The 
current implementation stops cleanup when it received a PROPERTIES_CHANGED 
event. This is actually wrong. It should continue. The way it is currently done 
has the effect that cleanup is only triggered upon a TOPOLOGY_INIT or 
TOPOLOGY_CHANGED without a following PROPERTIES_CHANGED. This current behaviour 
reduces the chances of the cleanup running - having said that, the likelyhood 
of the cleanup eventually running is still very high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2023-03-14 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli reassigned SLING-10854:
---

Assignee: Stefan Egli

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2023-03-14 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-10854.
-
Resolution: Fixed

PR merged

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2023-03-07 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697504#comment-17697504
 ] 

Stefan Egli commented on SLING-10854:
-

* draft PR created at 
https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/13

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2023-03-07 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10854:

Priority: Major  (was: Minor)

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-10624) Callback when SlingRepository init fails

2023-02-09 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-10624.
---

> Callback when SlingRepository init fails
> 
>
> Key: SLING-10624
> URL: https://issues.apache.org/jira/browse/SLING-10624
> Project: Sling
>  Issue Type: Improvement
>  Components: JCR
>Reporter: Marcel Reutegger
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: JCR Base 3.1.10
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {{AbstractSlingRepositoryManager}} initializes the repository asynchronously 
> in a separate thread. This makes it difficult for an implementing subclass to 
> detect when initialization fails. An implementing class calls {{start()}}, 
> which returns almost immediately, while the repository is starting up 
> asynchronously. There is no way to detect that {{start()}} was successful.
> There should be a callback method that can be overwritten by the implementing 
> class. The method would be called when initialization fails, before 
> {{stop()}} is finally called.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-17 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635219#comment-17635219
 ] 

Stefan Egli commented on SLING-11662:
-

[~cziegeler], great thx! I'll have a look as soon as possible, was also 
planning to add perhaps another test case or so (but have to first check how 
much coverage there is for these edge cases)..

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628445#comment-17628445
 ] 

Stefan Egli commented on SLING-11662:
-

(and if there was a reason then we could prevent activation when 
{{maxPoolSize==queueSize}} is configured)

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628443#comment-17628443
 ] 

Stefan Egli edited comment on SLING-11662 at 11/3/22 5:36 PM:
--

[~cziegeler], what I was wondering what the idea of the original (current) 
implementation of blockForAvailableThreads was.. is there a reason it looks at 
(static) configuration rather than actual threads vs queued runnables?


was (Author: egli):
[~cziegeler], what I was wondering what the idea of the original (current) 
implementation of blockForAvailableThreads was.. is there a reason it looks a 
(static) configuration rather than actual threads vs queued runnables?

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628443#comment-17628443
 ] 

Stefan Egli commented on SLING-11662:
-

[~cziegeler], what I was wondering what the idea of the original (current) 
implementation of blockForAvailableThreads was.. is there a reason it looks a 
(static) configuration rather than actual threads vs queued runnables?

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628427#comment-17628427
 ] 

Stefan Egli commented on SLING-11662:
-

The problem with the endless loop seems to be due to a breach of contract by 
sling.commons.scheduler.QuartzThreadPool:
* [quartz' ThreadPool 
javadoc|https://github.com/quartz-scheduler/quartz/blob/v2.3.2/quartz-core/src/main/java/org/quartz/spi/ThreadPool.java#L69-L82]
 says that
{quote}The implementation of this method should block until there is at least 
one available thread.{quote}
* however 
[sling.commons.scheduler.QuartzThreadPool#blockForAvailableThreads|https://github.com/apache/sling-org-apache-sling-commons-scheduler/blob/a9ddf38ea9d9962c8938a381135827072fc9397f/src/main/java/org/apache/sling/commons/scheduler/impl/QuartzThreadPool.java#L80]
 does not guarantee {{>0}} - and in particular if "maxPoolSize == queueSize" 
then this method will return 0
* that in turn leads quartz to hit the [ironically commented 
line|https://github.com/quartz-scheduler/quartz/blob/v2.3.2/quartz-core/src/main/java/org/quartz/core/QuartzSchedulerThread.java#L411-L414]
{code}
} else { // if(availThreadCount > 0)
// should never happen, if 
threadPool.blockForAvailableThreads() follows contract
continue; // while (!halted)
}
{code}
so it will just .. continue
* now this game repeats for ever after until .. the CPU becomes too hot due to 
constant 100% spinning and it breaks down .. leading to damage in a datacenter 
and so on and so forth

Presumably quartz did nothing wrong here - other than perhaps add a 
safety/paranoia {{Thread.sleep(1);}} before that {{continue}} to avoid this. 
The problem is rather on the sling side.

Question is how to best fix this.. [~cziegeler], [~joerghoh], any suggestions?

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628292#comment-17628292
 ] 

Stefan Egli commented on SLING-11662:
-

[PR 
created|https://github.com/apache/sling-org-apache-sling-commons-scheduler/pull/6]
 that reproduces the 100% cpu / endless loop in QuartzSchedulerThread.run()

> Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize
> -
>
> Key: SLING-11662
> URL: https://issues.apache.org/jira/browse/SLING-11662
> Project: Sling
>  Issue Type: Bug
>  Components: Commons
>Affects Versions: Commons Scheduler 2.7.12
>Reporter: Stefan Egli
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When configuring the ThreadPool with maxPoolSize == queueSize and endless 
> loop (can) happen(s) in QuartzSchedulerThread.run() which manifests as 
> follows:
> {noformat}
> "MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
> elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
> [0x87654321ff00]
>java.lang.Thread.State: RUNNABLE
> at 
> org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11662) Endless loop in QuartzSchedulerThread.run() with maxPoolSize == queueSize

2022-11-03 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11662:
---

 Summary: Endless loop in QuartzSchedulerThread.run() with 
maxPoolSize == queueSize
 Key: SLING-11662
 URL: https://issues.apache.org/jira/browse/SLING-11662
 Project: Sling
  Issue Type: Bug
  Components: Commons
Affects Versions: Commons Scheduler 2.7.12
Reporter: Stefan Egli


When configuring the ThreadPool with maxPoolSize == queueSize and endless loop 
(can) happen(s) in QuartzSchedulerThread.run() which manifests as follows:

{noformat}
"MyPool_QuartzSchedulerThread" #123 prio=5 os_prio=0 cpu=5123456.78ms 
elapsed=5163.45s tid=0x12345678ff00 nid=0x1234 runnable  
[0x87654321ff00]
   java.lang.Thread.State: RUNNABLE
at 
org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:413)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11619) Restore safeguard mechanism for discovery config's int and long properties

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11619:

Summary: Restore safeguard mechanism for discovery config's int and long 
properties  (was: Restore safeguard mechanism for discovery config)

> Restore safeguard mechanism for discovery config's int and long properties
> --
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.42
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10008) Add null annotations to package org.apache.sling.discovery (Discovery API)

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10008:

Fix Version/s: Discovery Oak 1.2.44

> Add null annotations to package org.apache.sling.discovery (Discovery API)
> --
>
> Key: SLING-10008
> URL: https://issues.apache.org/jira/browse/SLING-10008
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>
> In https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2492 
> and https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2498 
> there were potential NPEs uncovered. To prevent consumers from running into 
> those the Null annotations 
> (https://sling.apache.org/documentation/development/null-analysis.html) 
> should be added to the relevant classes there as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-10322) Upgrade discovery.* to parent 41

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-10322.
---
Assignee: Stefan Egli

> Upgrade discovery.* to parent 41
> 
>
> Key: SLING-10322
> URL: https://issues.apache.org/jira/browse/SLING-10322
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>
> Discovery.* still use rather old parent versions. They should be upgraded to 
> eg 41. This will involve quite some changes, including replacing 
> felix.scr.annotations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-10322) Upgrade discovery.* to parent 41

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-10322.
-
Resolution: Fixed

This has meanwhile been done in SLING-11355. Marking resolved.

> Upgrade discovery.* to parent 41
> 
>
> Key: SLING-10322
> URL: https://issues.apache.org/jira/browse/SLING-10322
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>
> Discovery.* still use rather old parent versions. They should be upgraded to 
> eg 41. This will involve quite some changes, including replacing 
> felix.scr.annotations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10322) Upgrade discovery.* to parent 41

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10322:

Fix Version/s: Discovery Oak 1.2.40

> Upgrade discovery.* to parent 41
> 
>
> Key: SLING-10322
> URL: https://issues.apache.org/jira/browse/SLING-10322
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>
> Discovery.* still use rather old parent versions. They should be upgraded to 
> eg 41. This will involve quite some changes, including replacing 
> felix.scr.annotations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10813) Improve ViewStateManagerImpl.waitForAsyncEvents, also speeds up tests

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10813:

Fix Version/s: Discovery Oak 1.2.44

> Improve ViewStateManagerImpl.waitForAsyncEvents, also speeds up tests
> -
>
> Key: SLING-10813
> URL: https://issues.apache.org/jira/browse/SLING-10813
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Reporter: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>
> As discussed [in this 
> PR|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/4#discussion_r708292265]
>  the ViewStateManagerImpl.waitForAsyncEvents returning currently requires a 
> {{Thread.sleep()}} to ensure anything that was "just triggered" has finished 
> executing asynchronously.
> This should be improved in this waitForAsyncEvent method, by being more 
> precise about when it returns (ie include any call to 
> {{asyncEvent.trigger()}} having terminated)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-10854) Introduce cleanup job of old slingId data in discovery

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10854:

Fix Version/s: Discovery Oak 1.2.44

> Introduce cleanup job of old slingId data in discovery
> --
>
> Key: SLING-10854
> URL: https://issues.apache.org/jira/browse/SLING-10854
> Project: Sling
>  Issue Type: Improvement
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.34
>Reporter: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.44
>
>
> Discovery.oak stores nodes and properties per slingId under 
> {{/var/discovery/oak}}. In a scenario where the slingIds are stable things 
> are fine. If the slingIds change frequently, old slingId-related data stays 
> as garbage and accumulates.
> We should introduce a cleanup job to delete old slingId data. The leader 
> could execute this to avoid race conditions. We might need to add some 
> additional property to indicate age of slingIds (there's already the 
> {{/var/discovery/oak/clusterInstances/leaderElectionIdCreatedAt}} property 
> which gets updated upon each discovery.oak bundle activation - but it's 
> somewhat indirect. Having a new, dedicated property sounds cleaner (this one 
> could be used to clean up old data though)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11619) Restore safeguard mechanism for discovery config

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11619:

Affects Version/s: Discovery Oak 1.2.40

> Restore safeguard mechanism for discovery config
> 
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.40
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.42
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9625) DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9625:
---
Fix Version/s: Discovery Oak 1.2.44
   (was: Discovery Oak 1.2.42)

> DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException 
> -
>
> Key: SLING-9625
> URL: https://issues.apache.org/jira/browse/SLING-9625
> Project: Sling
>  Issue Type: Improvement
>Affects Versions: Discovery Oak 1.2.30
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.44
>
>
> While stopping the OSGi container (Sling Starter 12 SNAPSHOT) I observed the 
> following error
> {code}
> 03.08.2020 10:30:06.262 *INFO * [Apache Sling Terminator] Stopping Apache 
> Sling
> ERROR: bundle org.apache.sling.discovery.oak:1.2.28 
> (139)[org.apache.sling.discovery.oak.OakDiscoveryService(200)] : The 
> updatedPropertyProvider method has thrown an exception
> java.lang.RuntimeException: Could not log in to repository 
> (org.apache.sling.api.resource.LoginException: Cannot derive user name for 
> bundle org.apache.sling.discovery.oak [139] and sub service null)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.doUpdateProperties(OakDiscoveryService.java:540)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.bindPropertyProviderInteral(OakDiscoveryService.java:406)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.updatedPropertyProvider(OakDiscoveryService.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:242)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:678)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$NotResolved.invoke(BaseMethod.java:633)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:524)
>   at 
> org.apache.felix.scr.impl.inject.methods.BindMethod.invoke(BindMethod.java:42)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUpdatedMethod(DependencyManager.java:1934)
>   at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUpdatedMethod(SingleComponentManager.java:448)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:366)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:297)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1229)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1137)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.track(ServiceTracker.java:883)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1168)
>   at 
> org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:125)
>   at 
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>   at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>   at 
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>   at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833)
>   at org.apache.felix.framework.Felix.access$000(Felix.java:112)
>   at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:434)
>   at 
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:601)
>   at 
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:132)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindService(JobConsumerManager.java:354)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindJobExecutor(JobConsumerManager.java:270)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (SLING-5598) Exclude slow tests by default with assume(sling.slow.tests.enabled)

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-5598:
---
Fix Version/s: Discovery Oak 1.2.44
   (was: Discovery Oak 1.2.42)

> Exclude slow tests by default with assume(sling.slow.tests.enabled) 
> 
>
> Key: SLING-5598
> URL: https://issues.apache.org/jira/browse/SLING-5598
> Project: Sling
>  Issue Type: Task
>  Components: Extensions
>Affects Versions: Discovery Impl 1.2.6, Discovery Base 1.1.2, Discovery 
> Commons 1.0.10, Discovery Oak 1.2.6
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Impl 1.2.14, Discovery Base 2.0.16, Discovery 
> Commons 1.0.30, Discovery Oak 1.2.44
>
> Attachments: SLING-5598-commons-testing.patch, 
> SLING-5598-discovery.patch
>
>
> As suggested by [~bdelacretaz] on [the 
> list|http://markmail.org/message/yad5awqg53epk3ck] we should improve test 
> duration (ideally 1-2min per bundle max, 10-15min overall). While they are 
> not yet improved however, slow tests should be excluded by default and run 
> only if enabled explicitly. Here's an example {{@Before}} method to achieve 
> that:
> {noformat}
> @Before
> public void checkSlowTests() {
> assumeNotNull(System.getProperty("sling.slow.tests.enabled"));
> }
> {noformat}
> and to enable the slow tests you do: {{mvn -Dsling.slow.tests.enabled=true 
> clean test}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11619) Restore safeguard mechanism for discovery config

2022-10-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11619.
-
Fix Version/s: Discovery Oak 1.2.42
   Resolution: Fixed

PR merged

> Restore safeguard mechanism for discovery config
> 
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
> Fix For: Discovery Oak 1.2.42
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11619) Restore safeguard mechanism for discovery config

2022-10-24 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623237#comment-17623237
 ] 

Stefan Egli commented on SLING-11619:
-

updated the PR and marked ready for review

> Restore safeguard mechanism for discovery config
> 
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11619) Restore safeguard mechanism for discovery config

2022-10-12 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616564#comment-17616564
 ] 

Stefan Egli commented on SLING-11619:
-

Started work in a [draft 
PR|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/11]

> Restore safeguard mechanism for discovery config
> 
>
> Key: SLING-11619
> URL: https://issues.apache.org/jira/browse/SLING-11619
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Minor
>
> With the [update to parent 
> 47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
>  the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
> values, such as empty strings. It used to silently swallow these, but now 
> fails loudly with
> {noformat}
> org.osgi.service.component.ComponentException: 
> java.lang.NumberFormatException: For input string: ""
>   at 
> org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
>  [org.apache.felix.scr:2.2.0]
>   at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
>   at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
> [org.apache.sling.discovery.oak:1.2.40]
>   at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
> [org.apache.sling.discovery.oak:1.2.40]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11619) Restore safeguard mechanism for discovery config

2022-10-12 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11619:
---

 Summary: Restore safeguard mechanism for discovery config
 Key: SLING-11619
 URL: https://issues.apache.org/jira/browse/SLING-11619
 Project: Sling
  Issue Type: Task
  Components: Discovery
Reporter: Stefan Egli
Assignee: Stefan Egli


With the [update to parent 
47|https://github.com/apache/sling-org-apache-sling-discovery-oak/commit/c306408f36e7636c72b71805d2bb0e3e6f0f0e73#diff-73d443e41e9bfaa5e9c77b6db0e318079f1885f5a7ed9685aae9730209adc579]
 the discovery.oak's Config "lost" the ability to gracefully deal with wrong 
values, such as empty strings. It used to silently swallow these, but now fails 
loudly with
{noformat}
org.osgi.service.component.ComponentException: java.lang.NumberFormatException: 
For input string: ""
at 
org.apache.felix.scr.impl.inject.internal.Annotations$Handler.invoke(Annotations.java:379)
 [org.apache.felix.scr:2.2.0]
at com.sun.proxy.$Proxy368.backoffStandbyFactor(Unknown Source)
at org.apache.sling.discovery.oak.Config.configure(Config.java:238) 
[org.apache.sling.discovery.oak:1.2.40]
at org.apache.sling.discovery.oak.Config.activate(Config.java:159) 
[org.apache.sling.discovery.oak:1.2.40]
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-08-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11450.
---

> Partially started instance suppression can lead to unwanted leader loss
> ---
>
> Key: SLING-11450
> URL: https://issues.apache.org/jira/browse/SLING-11450
> Project: Sling
>  Issue Type: Bug
>  Components: Discovery
>Affects Versions: Discovery Base 2.0.12, Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SLING-10489 introduced "partial startup suppression" sometimes also referred 
> to as "joinerdelay" (even though the latter is actually a subfeature of the 
> former).
> With this suppression enabled (it is disabled by default), upon a topology 
> change the leader instance can loose its leader status even though it did not 
> actually leave the topology or crash. This is against the discovery API 
> contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-08-03 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11496.
---

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11496:

Fix Version/s: Discovery Oak 1.2.40
   (was: Discovery Oak 1.2.38)

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11496:

Affects Version/s: Discovery Oak 1.2.36

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Affects Versions: Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.38
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9625) DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9625:
---
Fix Version/s: Discovery Oak 1.2.42
   (was: Discovery Oak 1.2.40)

> DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException 
> -
>
> Key: SLING-9625
> URL: https://issues.apache.org/jira/browse/SLING-9625
> Project: Sling
>  Issue Type: Improvement
>Affects Versions: Discovery Oak 1.2.30
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.42
>
>
> While stopping the OSGi container (Sling Starter 12 SNAPSHOT) I observed the 
> following error
> {code}
> 03.08.2020 10:30:06.262 *INFO * [Apache Sling Terminator] Stopping Apache 
> Sling
> ERROR: bundle org.apache.sling.discovery.oak:1.2.28 
> (139)[org.apache.sling.discovery.oak.OakDiscoveryService(200)] : The 
> updatedPropertyProvider method has thrown an exception
> java.lang.RuntimeException: Could not log in to repository 
> (org.apache.sling.api.resource.LoginException: Cannot derive user name for 
> bundle org.apache.sling.discovery.oak [139] and sub service null)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.doUpdateProperties(OakDiscoveryService.java:540)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.bindPropertyProviderInteral(OakDiscoveryService.java:406)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.updatedPropertyProvider(OakDiscoveryService.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:242)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:678)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$NotResolved.invoke(BaseMethod.java:633)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:524)
>   at 
> org.apache.felix.scr.impl.inject.methods.BindMethod.invoke(BindMethod.java:42)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUpdatedMethod(DependencyManager.java:1934)
>   at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUpdatedMethod(SingleComponentManager.java:448)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:366)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:297)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1229)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1137)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.track(ServiceTracker.java:883)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1168)
>   at 
> org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:125)
>   at 
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>   at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>   at 
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>   at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833)
>   at org.apache.felix.framework.Felix.access$000(Felix.java:112)
>   at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:434)
>   at 
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:601)
>   at 
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:132)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindService(JobConsumerManager.java:354)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindJobExecutor(JobConsumerManager.java:270)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (SLING-11355) Update parent bundle (48) to sling-discovery modules

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11355:

Fix Version/s: Discovery Oak 1.2.40
   (was: Discovery Oak 1.2.38)

> Update parent bundle (48) to sling-discovery modules
> 
>
> Key: SLING-11355
> URL: https://issues.apache.org/jira/browse/SLING-11355
> Project: Sling
>  Issue Type: Sub-task
>Reporter: Ashok Pelluru
>Assignee: Ashok Pelluru
>Priority: Major
> Fix For: Discovery Impl 1.2.14, Discovery Support 1.0.6, 
> Discovery Commons 1.0.28, Discovery Base 2.0.14, Discovery Oak 1.2.40
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-5598) Exclude slow tests by default with assume(sling.slow.tests.enabled)

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-5598:
---
Fix Version/s: Discovery Commons 1.0.30
   Discovery Oak 1.2.42
   (was: Discovery Commons 1.0.28)
   (was: Discovery Oak 1.2.40)

> Exclude slow tests by default with assume(sling.slow.tests.enabled) 
> 
>
> Key: SLING-5598
> URL: https://issues.apache.org/jira/browse/SLING-5598
> Project: Sling
>  Issue Type: Task
>  Components: Extensions
>Affects Versions: Discovery Impl 1.2.6, Discovery Base 1.1.2, Discovery 
> Commons 1.0.10, Discovery Oak 1.2.6
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Impl 1.2.14, Discovery Base 2.0.16, Discovery 
> Commons 1.0.30, Discovery Oak 1.2.42
>
> Attachments: SLING-5598-commons-testing.patch, 
> SLING-5598-discovery.patch
>
>
> As suggested by [~bdelacretaz] on [the 
> list|http://markmail.org/message/yad5awqg53epk3ck] we should improve test 
> duration (ideally 1-2min per bundle max, 10-15min overall). While they are 
> not yet improved however, slow tests should be excluded by default and run 
> only if enabled explicitly. Here's an example {{@Before}} method to achieve 
> that:
> {noformat}
> @Before
> public void checkSlowTests() {
> assumeNotNull(System.getProperty("sling.slow.tests.enabled"));
> }
> {noformat}
> and to enable the slow tests you do: {{mvn -Dsling.slow.tests.enabled=true 
> clean test}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-26 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11496.
-
Resolution: Fixed

merged

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.38
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-25 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570989#comment-17570989
 ] 

Stefan Egli commented on SLING-11496:
-

* created 
[PR#10|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/10]

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.38
>
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-25 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570925#comment-17570925
 ] 

Stefan Egli commented on SLING-11496:
-

* working on this 
[here|https://github.com/stefan-egli/sling-org-apache-sling-discovery-oak/tree/SLING-11496]

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.38
>
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-25 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11496:

Fix Version/s: Discovery Oak 1.2.38

> Fresh instance must remain suppressed until syncToken stored
> 
>
> Key: SLING-11496
> URL: https://issues.apache.org/jira/browse/SLING-11496
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Oak 1.2.38
>
>
> The changes in SLING-11450 have one case still missing : if an instance 
> reuses the clusterNodeId but is slow, it is not suppressed. Reason being that 
> there's no cleanup of data in /var/discovery/oak/idMap and 
> ./clusterInstances. So if it reuses the clusterNodeId, the old data from a 
> previous instance would still be there, and the other instances do not 
> distinguish where the data originated.
> The only way to detect a clusterNodeId-reuse is to require it to update the 
> syncToken. Until it doesn't do that it is suppressed. Once it does it, it 
> joins the cluster regularly. From then on, then syncToken is no longer 
> checked (since existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11496) Fresh instance must remain suppressed until syncToken stored

2022-07-25 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11496:
---

 Summary: Fresh instance must remain suppressed until syncToken 
stored
 Key: SLING-11496
 URL: https://issues.apache.org/jira/browse/SLING-11496
 Project: Sling
  Issue Type: Task
  Components: Discovery
Reporter: Stefan Egli
Assignee: Stefan Egli


The changes in SLING-11450 have one case still missing : if an instance reuses 
the clusterNodeId but is slow, it is not suppressed. Reason being that there's 
no cleanup of data in /var/discovery/oak/idMap and ./clusterInstances. So if it 
reuses the clusterNodeId, the old data from a previous instance would still be 
there, and the other instances do not distinguish where the data originated.

The only way to detect a clusterNodeId-reuse is to require it to update the 
syncToken. Until it doesn't do that it is suppressed. Once it does it, it joins 
the cluster regularly. From then on, then syncToken is no longer checked (since 
existing instances are excempted from that check).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-9625) DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException

2022-07-21 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9625:
---
Fix Version/s: Discovery Oak 1.2.40
   (was: Discovery Oak 1.2.38)

> DiscoveryServiceImpl#doUpdateProperties may fail due to a LoginException 
> -
>
> Key: SLING-9625
> URL: https://issues.apache.org/jira/browse/SLING-9625
> Project: Sling
>  Issue Type: Improvement
>Affects Versions: Discovery Oak 1.2.30
>Reporter: Konrad Windszus
>Priority: Major
> Fix For: Discovery Oak 1.2.40
>
>
> While stopping the OSGi container (Sling Starter 12 SNAPSHOT) I observed the 
> following error
> {code}
> 03.08.2020 10:30:06.262 *INFO * [Apache Sling Terminator] Stopping Apache 
> Sling
> ERROR: bundle org.apache.sling.discovery.oak:1.2.28 
> (139)[org.apache.sling.discovery.oak.OakDiscoveryService(200)] : The 
> updatedPropertyProvider method has thrown an exception
> java.lang.RuntimeException: Could not log in to repository 
> (org.apache.sling.api.resource.LoginException: Cannot derive user name for 
> bundle org.apache.sling.discovery.oak [139] and sub service null)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.doUpdateProperties(OakDiscoveryService.java:540)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.bindPropertyProviderInteral(OakDiscoveryService.java:406)
>   at 
> org.apache.sling.discovery.oak.OakDiscoveryService.updatedPropertyProvider(OakDiscoveryService.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:242)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:678)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod$NotResolved.invoke(BaseMethod.java:633)
>   at 
> org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:524)
>   at 
> org.apache.felix.scr.impl.inject.methods.BindMethod.invoke(BindMethod.java:42)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUpdatedMethod(DependencyManager.java:1934)
>   at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUpdatedMethod(SingleComponentManager.java:448)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:366)
>   at 
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.modifiedService(DependencyManager.java:297)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1229)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:1137)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.track(ServiceTracker.java:883)
>   at 
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1168)
>   at 
> org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:125)
>   at 
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>   at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>   at 
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>   at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833)
>   at org.apache.felix.framework.Felix.access$000(Felix.java:112)
>   at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:434)
>   at 
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:601)
>   at 
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:132)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindService(JobConsumerManager.java:354)
>   at 
> org.apache.sling.event.impl.jobs.JobConsumerManager.unbindJobExecutor(JobConsumerManager.java:270)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (SLING-5598) Exclude slow tests by default with assume(sling.slow.tests.enabled)

2022-07-21 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-5598:
---
Fix Version/s: Discovery Base 2.0.16
   Discovery Oak 1.2.40
   (was: Discovery Base 2.0.14)
   (was: Discovery Oak 1.2.38)

> Exclude slow tests by default with assume(sling.slow.tests.enabled) 
> 
>
> Key: SLING-5598
> URL: https://issues.apache.org/jira/browse/SLING-5598
> Project: Sling
>  Issue Type: Task
>  Components: Extensions
>Affects Versions: Discovery Impl 1.2.6, Discovery Base 1.1.2, Discovery 
> Commons 1.0.10, Discovery Oak 1.2.6
>Reporter: Stefan Egli
>Priority: Major
> Fix For: Discovery Impl 1.2.14, Discovery Commons 1.0.28, 
> Discovery Base 2.0.16, Discovery Oak 1.2.40
>
> Attachments: SLING-5598-commons-testing.patch, 
> SLING-5598-discovery.patch
>
>
> As suggested by [~bdelacretaz] on [the 
> list|http://markmail.org/message/yad5awqg53epk3ck] we should improve test 
> duration (ideally 1-2min per bundle max, 10-15min overall). While they are 
> not yet improved however, slow tests should be excluded by default and run 
> only if enabled explicitly. Here's an example {{@Before}} method to achieve 
> that:
> {noformat}
> @Before
> public void checkSlowTests() {
> assumeNotNull(System.getProperty("sling.slow.tests.enabled"));
> }
> {noformat}
> and to enable the slow tests you do: {{mvn -Dsling.slow.tests.enabled=true 
> clean test}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-07-21 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11450.
-
Resolution: Fixed

> Partially started instance suppression can lead to unwanted leader loss
> ---
>
> Key: SLING-11450
> URL: https://issues.apache.org/jira/browse/SLING-11450
> Project: Sling
>  Issue Type: Bug
>  Components: Discovery
>Affects Versions: Discovery Base 2.0.12, Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SLING-10489 introduced "partial startup suppression" sometimes also referred 
> to as "joinerdelay" (even though the latter is actually a subfeature of the 
> former).
> With this suppression enabled (it is disabled by default), upon a topology 
> change the leader instance can loose its leader status even though it did not 
> actually leave the topology or crash. This is against the discovery API 
> contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-07-21 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569456#comment-17569456
 ] 

Stefan Egli commented on SLING-11450:
-

* merged both PRs
* releasing both bundles next

> Partially started instance suppression can lead to unwanted leader loss
> ---
>
> Key: SLING-11450
> URL: https://issues.apache.org/jira/browse/SLING-11450
> Project: Sling
>  Issue Type: Bug
>  Components: Discovery
>Affects Versions: Discovery Base 2.0.12, Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SLING-10489 introduced "partial startup suppression" sometimes also referred 
> to as "joinerdelay" (even though the latter is actually a subfeature of the 
> former).
> With this suppression enabled (it is disabled by default), upon a topology 
> change the leader instance can loose its leader status even though it did not 
> actually leave the topology or crash. This is against the discovery API 
> contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SLING-11470) Revert discovery.base impl separation, bump major package version instead

2022-07-20 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-11470.
-
Resolution: Fixed

Thx [~apelluru] for the review, merged the PR now.

> Revert discovery.base impl separation, bump major package version instead
> -
>
> Key: SLING-11470
> URL: https://issues.apache.org/jira/browse/SLING-11470
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow-up of SLING-11355 ([discovery-base 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])
> As part of that PR impl classes of announcement and ping packages got moved 
> to impl subpackages to have better separation. Also, the package version bump 
> was suppressed.
> As now noticed by [~mreutegg], there is an issue with this change (that 
> blocks [discovery-oak 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) 
> : the {{CachedAnnouncement}} is part if the announcement package's API but 
> was now made private by moving it to impl.
> Several options how to fix this probably, listing two of them:
>  # keep the impl class separated. But fix {{CachedAnnouncement}} by placing 
> it back to the public package. This would require the class to be split into 
> an interface/implementation pair to avoid making registerPing public. 
> Additionally, continue the impl separation for the ping package by the other 
> 2 remaining implementation classes also to impl : 
> {{TopologyConnectorServlet}} and {{TopologyRequestValidator}} (with 
> corresponding adjustments in tests).
>  # go back to the original, non separated way (even though this was not best 
> practice).
> Also:
> * in both cases I would actually argue (a bit late) to not overrule the 
> baseline check and actually do the major version bumps. In hindsight seems 
> more appropriate, as it would ensure downstream users do the required upgrade.
> So, I would vote for option 2 + package bumps, as these are fewer changes and 
> the discovery.base package is mostly really only used by discovery.oak these 
> days, so I don't see a strong need for beautifying and introducing impl 
> separation.
> [~apelluru], [~kwin], [~rombert], wdyt?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11470) Revert discovery.base impl separation, bump major package version instead

2022-07-20 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568968#comment-17568968
 ] 

Stefan Egli commented on SLING-11470:
-

* Created 
[PR#10|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/10]

> Revert discovery.base impl separation, bump major package version instead
> -
>
> Key: SLING-11470
> URL: https://issues.apache.org/jira/browse/SLING-11470
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>
> This is a follow-up of SLING-11355 ([discovery-base 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])
> As part of that PR impl classes of announcement and ping packages got moved 
> to impl subpackages to have better separation. Also, the package version bump 
> was suppressed.
> As now noticed by [~mreutegg], there is an issue with this change (that 
> blocks [discovery-oak 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) 
> : the {{CachedAnnouncement}} is part if the announcement package's API but 
> was now made private by moving it to impl.
> Several options how to fix this probably, listing two of them:
>  # keep the impl class separated. But fix {{CachedAnnouncement}} by placing 
> it back to the public package. This would require the class to be split into 
> an interface/implementation pair to avoid making registerPing public. 
> Additionally, continue the impl separation for the ping package by the other 
> 2 remaining implementation classes also to impl : 
> {{TopologyConnectorServlet}} and {{TopologyRequestValidator}} (with 
> corresponding adjustments in tests).
>  # go back to the original, non separated way (even though this was not best 
> practice).
> Also:
> * in both cases I would actually argue (a bit late) to not overrule the 
> baseline check and actually do the major version bumps. In hindsight seems 
> more appropriate, as it would ensure downstream users do the required upgrade.
> So, I would vote for option 2 + package bumps, as these are fewer changes and 
> the discovery.base package is mostly really only used by discovery.oak these 
> days, so I don't see a strong need for beautifying and introducing impl 
> separation.
> [~apelluru], [~kwin], [~rombert], wdyt?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SLING-11470) Revert discovery.base impl separation, bump major package version instead

2022-07-20 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11470:

Description: 
This is a follow-up of SLING-11355 ([discovery-base 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])

As part of that PR impl classes of announcement and ping packages got moved to 
impl subpackages to have better separation. Also, the package version bump was 
suppressed.

As now noticed by [~mreutegg], there is an issue with this change (that blocks 
[discovery-oak 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) : 
the {{CachedAnnouncement}} is part if the announcement package's API but was 
now made private by moving it to impl.

Several options how to fix this probably, listing two of them:
 # keep the impl class separated. But fix {{CachedAnnouncement}} by placing it 
back to the public package. This would require the class to be split into an 
interface/implementation pair to avoid making registerPing public. 
Additionally, continue the impl separation for the ping package by the other 2 
remaining implementation classes also to impl : {{TopologyConnectorServlet}} 
and {{TopologyRequestValidator}} (with corresponding adjustments in tests).
 # go back to the original, non separated way (even though this was not best 
practice).

Also:
* in both cases I would actually argue (a bit late) to not overrule the 
baseline check and actually do the major version bumps. In hindsight seems more 
appropriate, as it would ensure downstream users do the required upgrade.

So, I would vote for option 2 + package bumps, as these are fewer changes and 
the discovery.base package is mostly really only used by discovery.oak these 
days, so I don't see a strong need for beautifying and introducing impl 
separation.

[~apelluru], [~kwin], [~rombert], wdyt?

  was:
This is a follow-up of SLING-11355 ([discovery-base 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])

As part of that PR impl classes of announcement and ping packages got moved to 
impl subpackages to have better separation. Also, the package version bump was 
suppressed.

As now noticed by [~mreutegg], there is an issue with this change (that blocks 
[discovery-oak 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) : 
the {{CachedAnnouncement}} is part if the announcement package's API but was 
now made private by moving it to impl.

Several options how to fix this probably, listing two of them:
 # keep the impl class separated. But fix {{CachedAnnouncement}} by placing it 
back to the public package. This would require the class to be split into an 
interface/implementation pair to avoid making registerPing public. 
Additionally, continue the impl separation for the ping package by the other 2 
remaining implementation classes also to impl : {{TopologyConnectorServlet}} 
and {{TopologyRequestValidator}} (with corresponding adjustments in tests).
 # go back to the original, non separated way (even though this was not best 
practice).

Also:
* in both cases I would actually argue (a bit late) to not overrule the 
baseline check and actually go the major version bumps. In hindsight seems more 
appropriate, as it would ensure downstream users do the required upgrade.

So, I would vote for option 2 + package bumps, as these are fewer changes and 
the discovery.base package is mostly really only used by discovery.oak these 
days, so I don't see a strong need for beautifying and introducing impl 
separation.

[~apelluru], [~kwin], [~rombert], wdyt?


> Revert discovery.base impl separation, bump major package version instead
> -
>
> Key: SLING-11470
> URL: https://issues.apache.org/jira/browse/SLING-11470
> Project: Sling
>  Issue Type: Task
>  Components: Discovery
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>
> This is a follow-up of SLING-11355 ([discovery-base 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])
> As part of that PR impl classes of announcement and ping packages got moved 
> to impl subpackages to have better separation. Also, the package version bump 
> was suppressed.
> As now noticed by [~mreutegg], there is an issue with this change (that 
> blocks [discovery-oak 
> PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) 
> : the {{CachedAnnouncement}} is part if the announcement package's API but 
> was now made private by moving it to impl.
> Several options how to fix this probably, listing two of them:
>  # keep the impl class separated. But fix {{CachedAnnouncement}} by placing 
> it back to the public package. This would require the class to be split into 
> an 

[jira] [Created] (SLING-11470) Revert discovery.base impl separation, bump major package version instead

2022-07-20 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11470:
---

 Summary: Revert discovery.base impl separation, bump major package 
version instead
 Key: SLING-11470
 URL: https://issues.apache.org/jira/browse/SLING-11470
 Project: Sling
  Issue Type: Task
  Components: Discovery
Reporter: Stefan Egli
Assignee: Stefan Egli


This is a follow-up of SLING-11355 ([discovery-base 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/7])

As part of that PR impl classes of announcement and ping packages got moved to 
impl subpackages to have better separation. Also, the package version bump was 
suppressed.

As now noticed by [~mreutegg], there is an issue with this change (that blocks 
[discovery-oak 
PR#7|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7] ) : 
the {{CachedAnnouncement}} is part if the announcement package's API but was 
now made private by moving it to impl.

Several options how to fix this probably, listing two of them:
 # keep the impl class separated. But fix {{CachedAnnouncement}} by placing it 
back to the public package. This would require the class to be split into an 
interface/implementation pair to avoid making registerPing public. 
Additionally, continue the impl separation for the ping package by the other 2 
remaining implementation classes also to impl : {{TopologyConnectorServlet}} 
and {{TopologyRequestValidator}} (with corresponding adjustments in tests).
 # go back to the original, non separated way (even though this was not best 
practice).

Also:
* in both cases I would actually argue (a bit late) to not overrule the 
baseline check and actually go the major version bumps. In hindsight seems more 
appropriate, as it would ensure downstream users do the required upgrade.

So, I would vote for option 2 + package bumps, as these are fewer changes and 
the discovery.base package is mostly really only used by discovery.oak these 
days, so I don't see a strong need for beautifying and introducing impl 
separation.

[~apelluru], [~kwin], [~rombert], wdyt?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-07-13 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566362#comment-17566362
 ] 

Stefan Egli commented on SLING-11450:
-

* actual fix is in discovery.oak in [this 
PR|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/7]

> Partially started instance suppression can lead to unwanted leader loss
> ---
>
> Key: SLING-11450
> URL: https://issues.apache.org/jira/browse/SLING-11450
> Project: Sling
>  Issue Type: Bug
>  Components: Discovery
>Affects Versions: Discovery Base 2.0.12, Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SLING-10489 introduced "partial startup suppression" sometimes also referred 
> to as "joinerdelay" (even though the latter is actually a subfeature of the 
> former).
> With this suppression enabled (it is disabled by default), upon a topology 
> change the leader instance can loose its leader status even though it did not 
> actually leave the topology or crash. This is against the discovery API 
> contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-07-13 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566352#comment-17566352
 ] 

Stefan Egli commented on SLING-11450:
-

* simple test buglet fix in discovery.base in [this 
PR|https://github.com/apache/sling-org-apache-sling-discovery-base/pull/9]

> Partially started instance suppression can lead to unwanted leader loss
> ---
>
> Key: SLING-11450
> URL: https://issues.apache.org/jira/browse/SLING-11450
> Project: Sling
>  Issue Type: Bug
>  Components: Discovery
>Affects Versions: Discovery Base 2.0.12, Discovery Oak 1.2.36
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
> Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38
>
>
> SLING-10489 introduced "partial startup suppression" sometimes also referred 
> to as "joinerdelay" (even though the latter is actually a subfeature of the 
> former).
> With this suppression enabled (it is disabled by default), upon a topology 
> change the leader instance can loose its leader status even though it did not 
> actually leave the topology or crash. This is against the discovery API 
> contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (SLING-11450) Partially started instance suppression can lead to unwanted leader loss

2022-07-13 Thread Stefan Egli (Jira)
Stefan Egli created SLING-11450:
---

 Summary: Partially started instance suppression can lead to 
unwanted leader loss
 Key: SLING-11450
 URL: https://issues.apache.org/jira/browse/SLING-11450
 Project: Sling
  Issue Type: Bug
  Components: Discovery
Affects Versions: Discovery Oak 1.2.36, Discovery Base 2.0.12
Reporter: Stefan Egli
Assignee: Stefan Egli
 Fix For: Discovery Base 2.0.14, Discovery Oak 1.2.38


SLING-10489 introduced "partial startup suppression" sometimes also referred to 
as "joinerdelay" (even though the latter is actually a subfeature of the 
former).

With this suppression enabled (it is disabled by default), upon a topology 
change the leader instance can loose its leader status even though it did not 
actually leave the topology or crash. This is against the discovery API 
contract, which says that the leader stays leader until it crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2022-06-27 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559197#comment-17559197
 ] 

Stefan Egli commented on SLING-9664:


[~rombert], +1, I think downstream effort should be smaller these days for 
doing such a change now.

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.8
>
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (SLING-11385) Update to Sling Bundle Parent 48

2022-06-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11385.
---

> Update to Sling Bundle Parent 48
> 
>
> Key: SLING-11385
> URL: https://issues.apache.org/jira/browse/SLING-11385
> Project: Sling
>  Issue Type: Task
>  Components: Event
>Reporter: Oliver Lietz
>Assignee: Oliver Lietz
>Priority: Major
> Fix For: Event 4.3.6
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (SLING-11346) CleanUpTest is flaky

2022-06-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11346.
---

> CleanUpTest is flaky
> 
>
> Key: SLING-11346
> URL: https://issues.apache.org/jira/browse/SLING-11346
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.2
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The CleanUpTest is flaky and fails depending on date.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (SLING-8413) JobManagerImpl.findJobs does not escape some values when running queries

2022-06-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-8413.
--

> JobManagerImpl.findJobs does not escape some values when running queries
> 
>
> Key: SLING-8413
> URL: https://issues.apache.org/jira/browse/SLING-8413
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Reporter: Thomas Mueller
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For SLING-8407 [~stefanegli] found that JobManagerImpl.findJobs doesn't 
> escape some values when building a JCR query. Values need to be escaped. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (SLING-11379) Do not create duplicate scheduled jobs

2022-06-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-11379.
---

> Do not create duplicate scheduled jobs
> --
>
> Key: SLING-11379
> URL: https://issues.apache.org/jira/browse/SLING-11379
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Affects Versions: Event 4.3.2
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I came across situations where periodic jobs were added in the activation of 
> an OSGI component, which eventually piles up to a huge amount of (identical) 
> jobs being scheduled, causing major problems on load and stability.
> Before adding a new scheduled job, it should be checked if it already exists. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (SLING-8582) Improve test coverage

2022-06-27 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli closed SLING-8582.
--

> Improve test coverage 
> --
>
> Key: SLING-8582
> URL: https://issues.apache.org/jira/browse/SLING-8582
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Affects Versions: Event 4.2.12
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Right now the unittest coverage of sling eventing has room for improvement, 
> also there are some sonar reportings for it.
> Please merge [https://github.com/apache/sling-org-apache-sling-event/pull/6] 
> as a first step to towards adressing those issues.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-9664:
---
Fix Version/s: Event 4.3.8

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.8
>
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-8582) Improve test coverage

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-8582:
---
Fix Version/s: Event 4.3.6
   (was: Event 4.3.4)

> Improve test coverage 
> --
>
> Key: SLING-8582
> URL: https://issues.apache.org/jira/browse/SLING-8582
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Affects Versions: Event 4.2.12
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Right now the unittest coverage of sling eventing has room for improvement, 
> also there are some sonar reportings for it.
> Please merge [https://github.com/apache/sling-org-apache-sling-event/pull/6] 
> as a first step to towards adressing those issues.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-11385) Update to Sling Bundle Parent 48

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11385:

Fix Version/s: Event 4.3.6
   (was: Event 4.3.4)

> Update to Sling Bundle Parent 48
> 
>
> Key: SLING-11385
> URL: https://issues.apache.org/jira/browse/SLING-11385
> Project: Sling
>  Issue Type: Task
>  Components: Event
>Reporter: Oliver Lietz
>Assignee: Oliver Lietz
>Priority: Major
> Fix For: Event 4.3.6
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-11379) Do not create duplicate scheduled jobs

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11379:

Fix Version/s: Event 4.3.6
   (was: Event 4.3.4)

> Do not create duplicate scheduled jobs
> --
>
> Key: SLING-11379
> URL: https://issues.apache.org/jira/browse/SLING-11379
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Affects Versions: Event 4.3.2
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I came across situations where periodic jobs were added in the activation of 
> an OSGI component, which eventually piles up to a huge amount of (identical) 
> jobs being scheduled, causing major problems on load and stability.
> Before adding a new scheduled job, it should be checked if it already exists. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-11346) CleanUpTest is flaky

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-11346:

Fix Version/s: Event 4.3.6
   (was: Event 4.3.4)

> CleanUpTest is flaky
> 
>
> Key: SLING-11346
> URL: https://issues.apache.org/jira/browse/SLING-11346
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Affects Versions: Event 4.3.2
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The CleanUpTest is flaky and fails depending on date.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (SLING-8413) JobManagerImpl.findJobs does not escape some values when running queries

2022-06-23 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-8413:
---
Fix Version/s: Event 4.3.6
   (was: Event 4.3.4)

> JobManagerImpl.findJobs does not escape some values when running queries
> 
>
> Key: SLING-8413
> URL: https://issues.apache.org/jira/browse/SLING-8413
> Project: Sling
>  Issue Type: Bug
>  Components: Event
>Reporter: Thomas Mueller
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Event 4.3.6
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For SLING-8407 [~stefanegli] found that JobManagerImpl.findJobs doesn't 
> escape some values when building a JCR query. Values need to be escaped. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2022-06-23 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558098#comment-17558098
 ] 

Stefan Egli commented on SLING-9664:


PS: I'm currently releasing sling.event 4.3.6 and would prefer moving ahead 
with that, even without fixing this ticket - as I don't think it should block 
it and we can instead look at fixing it in the consuming (site) part

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (SLING-9664) org.apache.sling.event.jobs package not present in javadoc for sling10+

2022-06-23 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558091#comment-17558091
 ] 

Stefan Egli commented on SLING-9664:


Right, the reason was SLING-6739 : the original sling.event bundle contained 
both API and implementation and the split was designed to be as smooth as 
possible, ie the after-split sling.event bundle would just embed the 
sling.event.api bundle. But apparently the javadoc got lost in this translation.

I'm wondering if there's a way to re-include the javadocs (perhaps via the 
[bnd.bnd|https://github.com/apache/sling-org-apache-sling-event/blob/master/bnd.bnd]
 file)..

> org.apache.sling.event.jobs package not present in javadoc for sling10+
> ---
>
> Key: SLING-9664
> URL: https://issues.apache.org/jira/browse/SLING-9664
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Joerg Hoh
>Priority: Major
>
> While the javadoc for sling9 [1] cover the org.apache.sling.event.jobs 
> package(s), they went missing with the sling10 javadoc [2] and subsequent 
> versions.
> [1] https://sling.apache.org/apidocs/sling9/index.html
> [2] https://sling.apache.org/apidocs/sling10/index.html



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (SLING-9905) Provide option to include sling instance id nodes in cleanup

2022-05-18 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538948#comment-17538948
 ] 

Stefan Egli commented on SLING-9905:


Ok I merged the PR.
+1 for resolving this ticket

> Provide option to include sling instance id nodes in cleanup
> 
>
> Key: SLING-9905
> URL: https://issues.apache.org/jira/browse/SLING-9905
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Carsten Ziegeler
>Assignee: Carsten Ziegeler
>Priority: Major
> Fix For: Event 4.3.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, empty nodes for Sling job handling are removed, up to the JCR node 
> specific to a Sling instance id. That node is kept in place to avoid 
> unnecessary removal and recreation.
> This assumes that the instances are stable and do not frequently change.
> However, in volatile installations where new instances are constantly created 
> (and old ones stopped), this leads to a growing number of nodes for each id 
> ever started.
> One way would be to provide a "idle timeout" configuration - for an instance 
> id it is recorded when it was first not seen anymore and if it is not seen 
> for the configured time, these nodes get removed. For example, if the timeout 
> is set to one day, such nodes are removed approximately one day after the 
> instance was stopped.
> There are probably other ways to fix this



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (SLING-9905) Provide option to include sling instance id nodes in cleanup

2022-05-18 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538921#comment-17538921
 ] 

Stefan Egli commented on SLING-9905:


[~cziegeler], I've now created some basic tests around the cleanup - it did not 
unveil any problems, which is good :)
We might do more fancy edge case testing, not sure, but at least this could be 
a basis.
I've created it as a PR so far at 
https://github.com/apache/sling-org-apache-sling-event/pull/18

> Provide option to include sling instance id nodes in cleanup
> 
>
> Key: SLING-9905
> URL: https://issues.apache.org/jira/browse/SLING-9905
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Carsten Ziegeler
>Assignee: Carsten Ziegeler
>Priority: Major
> Fix For: Event 4.3.2
>
>
> Currently, empty nodes for Sling job handling are removed, up to the JCR node 
> specific to a Sling instance id. That node is kept in place to avoid 
> unnecessary removal and recreation.
> This assumes that the instances are stable and do not frequently change.
> However, in volatile installations where new instances are constantly created 
> (and old ones stopped), this leads to a growing number of nodes for each id 
> ever started.
> One way would be to provide a "idle timeout" configuration - for an instance 
> id it is recorded when it was first not seen anymore and if it is not seen 
> for the configured time, these nodes get removed. For example, if the timeout 
> is set to one day, such nodes are removed approximately one day after the 
> instance was stopped.
> There are probably other ways to fix this



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   3   4   5   6   7   8   9   10   >