[jira] [Resolved] (SLING-9474) Bad partition when using an empty package topic

2020-05-28 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9474.
---
Resolution: Fixed

> Bad partition when using an empty package topic
> ---
>
> Key: SLING-9474
> URL: https://issues.apache.org/jira/browse/SLING-9474
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> Since SLING-9460, the PubQueueCacheService tail poller fails to establish on 
> empty topics. It prevents agent from activating properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9476) Journal IT tests fail

2020-05-28 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119091#comment-17119091
 ] 

Timothee Maret commented on SLING-9476:
---

Fixed in 
[73b225fc6d5a3ea67cc6315201d7a4c541e486cd|https://github.com/apache/sling-org-apache-sling-distribution-journal-it/commit/73b225fc6d5a3ea67cc6315201d7a4c541e486cd].
 Two tests had to be adjusted because they made the assumption that the first 
package sent would be at offset 1. This assumption is wrong since SLING-9460. 
As no seeding messages are sent, the first package comes at offset 0.

IT tests pass at 
https://builds.apache.org/job/Sling/job/sling-org-apache-sling-distribution-journal-it/job/master/384/


 

> Journal IT tests fail
> -
>
> Key: SLING-9476
> URL: https://issues.apache.org/jira/browse/SLING-9476
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal ITs 0.1.2
>
>
> Since the change SLING-9460, some IT test fail
> https://builds.apache.org/job/Sling/job/sling-org-apache-sling-distribution-journal-it/
> AuthorDistributeTest.testDistribute



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9476) Journal IT tests fail

2020-05-28 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9476.
---
Resolution: Fixed

> Journal IT tests fail
> -
>
> Key: SLING-9476
> URL: https://issues.apache.org/jira/browse/SLING-9476
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal ITs 0.1.2
>
>
> Since the change SLING-9460, some IT test fail
> https://builds.apache.org/job/Sling/job/sling-org-apache-sling-distribution-journal-it/
> AuthorDistributeTest.testDistribute



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-05-28 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9482:
--
Fix Version/s: (was: Content Distribution Journal Core 0.1.20)
   Content Distribution Journal Core 0.1.16

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-05-28 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9482:
--
Fix Version/s: Content Distribution Journal Core 0.1.20

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.20
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-05-28 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119097#comment-17119097
 ] 

Timothee Maret commented on SLING-9482:
---

Done in 
[c7814657bb42996558a781d749a93c72437c89fc|https://github.com/apache/sling-org-apache-sling-distribution-journal/commit/c7814657bb42996558a781d749a93c72437c89fc].
  The change introduce the following repository operations:

* One read op when a publisher agent starts
* At most one write operation per cluster every 15 minutes, 0 write op if no 
package was produced during the last 15 minutes

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-05-29 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9482.
---
Resolution: Fixed

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9447) Add packageId property to distribution events

2020-05-26 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116967#comment-17116967
 ] 

Timothee Maret commented on SLING-9447:
---

[~cschneider] ping

> Add packageId property to distribution events
> -
>
> Key: SLING-9447
> URL: https://issues.apache.org/jira/browse/SLING-9447
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution API 0.4.0
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution API 0.4.2
>
>
> For all distribution events it is useful to track the package id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9442) Complete the semantic definition of distribution events

2020-05-26 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9442.
---
Resolution: Fixed

Resolved as fixed. 

> Complete the semantic definition of distribution events  
> -
>
> Key: SLING-9442
> URL: https://issues.apache.org/jira/browse/SLING-9442
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution API 0.4.2
>
>
> The description of [distribution 
> events|https://github.com/apache/sling-org-apache-sling-distribution-api/blob/master/src/main/java/org/apache/sling/distribution/event/DistributionEventTopics.java]
>  does not clearly document the conditions for raising the events. We should 
> extend the documentation both in the API and in the published documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9742) CLONE - DistributionPublisher does not validate queue names

2020-09-20 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199087#comment-17199087
 ] 

Timothee Maret commented on SLING-9742:
---

[~dsuess] could you describe how to reproduce this ? In particular, which 
unexpected structure is being mixed in ?

> CLONE - DistributionPublisher does not validate queue names
> ---
>
> Key: SLING-9742
> URL: https://issues.apache.org/jira/browse/SLING-9742
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Dominik Süß
>Assignee: Timothee Maret
>Priority: Major
>
> ExtendedDistributionServiceResourceProvider currently fails to render json in 
> case unexpected structures are mixed in (e.g. to provide resources to render 
> a corresponding UI)
> This causes exceptions like this:
> {code}
> java.lang.IllegalArgumentException: Unsupported entryId format jcr:content
>   at 
> org.apache.sling.distribution.journal.impl.queue.impl.EntryUtil.entryOffset(EntryUtil.java:32)
>   at 
> org.apache.sling.distribution.journal.impl.queue.impl.PubQueue.getEntry(PubQueue.java:126)
>   at 
> org.apache.sling.distribution.resources.impl.ExtendedDistributionServiceResourceProvider.getQueueProperties(ExtendedDistributionServiceResourceProvider.java:180)
>   at 
> org.apache.sling.distribution.resources.impl.ExtendedDistributionServiceResourceProvider.getChildResourceProperties(ExtendedDistributionServiceResourceProvider.java:81)
>   at 
> org.apache.sling.distribution.resources.impl.DistributionServiceResourceProvider.getInternalResourceProperties(DistributionServiceResourceProvider.java:64)
>   at 
> org.apache.sling.distribution.resources.impl.common.AbstractReadableResourceProvider.getResourceProperties(AbstractReadableResourceProvider.java:175)
>   at 
> org.apache.sling.distribution.resources.impl.common.AbstractReadableResourceProvider.getResource(AbstractReadableResourceProvider.java:79)
>   at 
> org.apache.sling.distribution.resources.impl.DistributionServiceResourceProviderFactory.getResource(DistributionServiceResourceProviderFactory.java:99)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9628) Send log messages from subscriber to publisher

2020-08-04 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170907#comment-17170907
 ] 

Timothee Maret commented on SLING-9628:
---

Do we really need a new message type ? How about adding the error log message 
in the existing  
How about adding the log message (error) in the existing DiscoveryMessage ? The 
DiscoveryMessage message would inform about the error message in addition to 
the number of retries which is already carried.

> Send log messages from subscriber to publisher
> --
>
> Key: SLING-9628
> URL: https://issues.apache.org/jira/browse/SLING-9628
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18, Content 
> Distribution Journal Kafka 0.1.6, Content Distribution Journal Messages 0.1.10
>
>
> In the content distribution UI we can display a distribution log per agent.
> Currently this log only shows that package messages are sent out. 
> This issue is about also showing successfully imported packages as well as 
> errors during import on the subscriber side.
> The idea is to send a new message type LogMessage on the discovery topic. 
> These messages are received by the DiscoveryService and added to the 
> DIstributionLog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9569) Emit error code when error event received if available

2020-07-07 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9569:
--
Fix Version/s: Content Distribution Journal Messages 0.1.10
   Content Distribution Journal Core 0.1.18

> Emit error code when error event received if available
> --
>
> Key: SLING-9569
> URL: https://issues.apache.org/jira/browse/SLING-9569
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18, Content 
> Distribution Journal Messages 0.1.10
>
>
> Allow to send error code if available from exception to the error event and 
> emit error code metric counter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-9569) Emit error code when error event received if available

2020-07-07 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-9569:
-

Assignee: Timothee Maret

> Emit error code when error event received if available
> --
>
> Key: SLING-9569
> URL: https://issues.apache.org/jira/browse/SLING-9569
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
>
> Allow to send error code if available from exception to the error event and 
> emit error code metric counter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9569) Emit error code when error event received if available

2020-07-07 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152791#comment-17152791
 ] 

Timothee Maret commented on SLING-9569:
---

Thanks [~amitj], I merged the PRs.

> Emit error code when error event received if available
> --
>
> Key: SLING-9569
> URL: https://issues.apache.org/jira/browse/SLING-9569
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
>
> Allow to send error code if available from exception to the error event and 
> emit error code metric counter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9569) Emit error code when error event received if available

2020-07-07 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152847#comment-17152847
 ] 

Timothee Maret commented on SLING-9569:
---

Unit tests and IT tests pass. Resolving.

> Emit error code when error event received if available
> --
>
> Key: SLING-9569
> URL: https://issues.apache.org/jira/browse/SLING-9569
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18, Content 
> Distribution Journal Messages 0.1.10
>
>
> Allow to send error code if available from exception to the error event and 
> emit error code metric counter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9569) Emit error code when error event received if available

2020-07-07 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9569.
---
Resolution: Fixed

> Emit error code when error event received if available
> --
>
> Key: SLING-9569
> URL: https://issues.apache.org/jira/browse/SLING-9569
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18, Content 
> Distribution Journal Messages 0.1.10
>
>
> Allow to send error code if available from exception to the error event and 
> emit error code metric counter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155314#comment-17155314
 ] 

Timothee Maret commented on SLING-9577:
---

Explaining is only possible with people who listen. The reason in in the 
thread. Also, I added a way forward which is to avoid the seed on the first 
place. This is a refactoring effort, you could also refactor that part in 11 
days.

The threaded approach has caused us many CSOs even though it was thoroughly 
tested ;-) 

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155346#comment-17155346
 ] 

Timothee Maret commented on SLING-9577:
---

You also added a bug  it can be that none of the the bounded set of messages 
is consumed. I really think we should dedicate our time to look forward: make 
sure no seed is required by changing the way we consider the cache to be 
seeded. 

When it comes to the Apache rules, by this message I allow you to override my 
veto if you consider the consensus is reached and its really impossible to 
extract common code without changing the code semantic.

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155397#comment-17155397
 ] 

Timothee Maret commented on SLING-9577:
---

bq. The same situation could happen with the current code when the single 
message sent out does not arrive for some reason.

No because it's an assign request and we assign to an offset that we know 
correspond to a message created before the new seed message.

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155305#comment-17155305
 ] 

Timothee Maret commented on SLING-9577:
---

I am on PTO until July 21st, sorry. The -1 still holds, the current code does 
ensure a bound (1 seed max), the old code did not (unbounded number of seeds). 
We could also remove the 1 seed with the current code if we considered cache 
seeded without consuming an actual message.

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155283#comment-17155283
 ] 

Timothee Maret commented on SLING-9577:
---

-1 to going back to this. This threaded mechanism caused a stream of 
escalations in practice and does not ensure a bound to the number of seed 
messages.

I suggest to investigate the purpose of the seeding thread. Why do we need at 
least one seed ? Would it change if we could figure out the end-offset on a 
topic ? Couldn't we just avoid the seed by assuming the cache ready without 
consuming an actual message ?

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9577) Switch back to seeding thread

2020-07-10 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155443#comment-17155443
 ] 

Timothee Maret commented on SLING-9577:
---

An unbounded + exponential backoff would be correct and better than previous 
threaded mechanism.

I won t be able to reply further. To not block you, i remove my veto and will 
review before the Sling release is cut (ideally in 11+ days).

> Switch back to seeding thread
> -
>
> Key: SLING-9577
> URL: https://issues.apache.org/jira/browse/SLING-9577
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The current code uses a combination of seeding thread, persisting and loading 
> offsets from sling repo and sending single seeding messages.
> In sum this means we send at least one seeding message (seeding thread) on 
> first run and one seeding message on following runs.
> I propose to switch back to a pure seeding thread solution and make sure it 
> terminates correctly. This solution should in almost all cases also just send 
> 1 message and is a lot simpler.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9537.
---
Resolution: Fixed

The test regression does not seem related to the change. Filled SLING-9546.

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9546) SubscriberTest.testReceiveDelete fails

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9546:
--
Description: 
The test started to fail in Jenkins but not reproducible locally.
 
{code}
Error Message
Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.

Stacktrace
org.awaitility.core.ConditionTimeoutException: Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)

Standard Output
2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
Started Queue processor -  
2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
offset 0, subscribed to agent names [pub1agent] with package builder journal 
editable false maxRetries -1 errorQueueEnabled false -  
2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution package 
myid of type DELETE at offset 0 -  
2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths [/test] 
- retries=0, paths=/test, sub-sling-id=sub1sling, module=distribution, 
sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
subscribed to Publisher agent names [pub1agent] with package builder journal -  
{code}

  was:
The test started to fail in Jenkins but not reproducible locally.
 
{code}
Error Message
Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.
Stacktrace
org.awaitility.core.ConditionTimeoutException: Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
Standard Output
2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
Started Queue processor -  
2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
offset 0, subscribed to agent names [pub1agent] with package builder journal 
editable false maxRetries -1 errorQueueEnabled false -  
2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution package 
myid of type DELETE at offset 0 -  
2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths [/test] 
- retries=0, paths=/test, sub-sling-id=sub1sling, module=distribution, 
sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
subscribed to Publisher agent names [pub1agent] with package builder journal -  
{code}


> SubscriberTest.testReceiveDelete fails
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> 

[jira] [Assigned] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-9537:
-

Assignee: Timothee Maret

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9537:
--
Fix Version/s: Content Distribution Journal 0.1.

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Priority: Major
> Fix For: Content Distribution Journal 0.1.
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9537.
---
Resolution: Fixed

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143664#comment-17143664
 ] 

Timothee Maret commented on SLING-9537:
---

Thanks [~amitj]! Merged your PR.

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Priority: Major
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-9546) SubscriberTest.testReceiveDelete fails

2020-06-24 Thread Timothee Maret (Jira)
Timothee Maret created SLING-9546:
-

 Summary: SubscriberTest.testReceiveDelete fails
 Key: SLING-9546
 URL: https://issues.apache.org/jira/browse/SLING-9546
 Project: Sling
  Issue Type: Bug
  Components: Content Distribution
Reporter: Timothee Maret
 Fix For: Content Distribution Journal Core 0.1.18


The test started to fail in Jenkins but not reproducible locally.
 
{code}
Error Message
Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.
Stacktrace
org.awaitility.core.ConditionTimeoutException: Lambda expression in 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that uses 
org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
expected  but was  within 10 seconds.
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
at 
org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
Standard Output
2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
Started Queue processor -  
2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
offset 0, subscribed to agent names [pub1agent] with package builder journal 
editable false maxRetries -1 errorQueueEnabled false -  
2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution package 
myid of type DELETE at offset 0 -  
2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths [/test] 
- retries=0, paths=/test, sub-sling-id=sub1sling, module=distribution, 
sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
[DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
subscribed to Publisher agent names [pub1agent] with package builder journal -  
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9546) SubscriberTest.testReceiveDelete is prone to timing issues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9546.
---
Resolution: Fixed

> SubscriberTest.testReceiveDelete is prone to timing issues
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reopened SLING-9537:
---

Reopeing, the tests seem to fail in Jenkins (not locally)

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9546) SubscriberTest#testReceiveDelete is prone to timing issues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9546:
--
Summary: SubscriberTest#testReceiveDelete is prone to timing issues  (was: 
SubscriberTest.testReceiveDelete is prone to timing issues)

> SubscriberTest#testReceiveDelete is prone to timing issues
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9546) SubscriberTest.testReceiveDelete is prone to timing issues

2020-06-24 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143718#comment-17143718
 ] 

Timothee Maret commented on SLING-9546:
---

Merged PR 
[#44|https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/44]
 which fixes the test.

> SubscriberTest.testReceiveDelete is prone to timing issues
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9546) SubscriberTest.testReceiveDelete fails

2020-06-24 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143700#comment-17143700
 ] 

Timothee Maret commented on SLING-9546:
---

The test expects the subscriber to go from RUNNING to IDLE when processing an 
item, 
[here|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/f4af344894622eec9cf61aedbfe0efb07a8f3530/src/test/java/org/apache/sling/distribution/journal/impl/subscriber/SubscriberTest.java#L252-L253].
 That assumption is invalid and prone to timing issues.

> SubscriberTest.testReceiveDelete fails
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9546) SubscriberTest.testReceiveDelete is prone to timing issues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9546:
--
Summary: SubscriberTest.testReceiveDelete is prone to timing issues  (was: 
SubscriberTest.testReceiveDelete fails)

> SubscriberTest.testReceiveDelete is prone to timing issues
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-9546) SubscriberTest.testReceiveDelete is prone to timing issues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-9546:
-

Assignee: Timothee Maret

> SubscriberTest.testReceiveDelete is prone to timing issues
> --
>
> Key: SLING-9546
> URL: https://issues.apache.org/jira/browse/SLING-9546
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> The test started to fail in Jenkins but not reproducible locally.
>  
> {code}
> Error Message
> Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
> Stacktrace
> org.awaitility.core.ConditionTimeoutException: Lambda expression in 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest that 
> uses 
> org.apache.sling.distribution.journal.impl.subscriber.DistributionSubscriber: 
> expected  but was  within 10 seconds.
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.waitSubscriber(SubscriberTest.java:345)
>   at 
> org.apache.sling.distribution.journal.impl.subscriber.SubscriberTest.testReceiveDelete(SubscriberTest.java:252)
> Standard Output
> 2020-06-24 09:15:39,935 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.DistributionSubscriber [DistributionSubscriber.java : 282] 
> Started Queue processor -  
> 2020-06-24 09:15:39,938 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 202] Started Subscriber agent sub1agent at 
> offset 0, subscribed to agent names [pub1agent] with package builder journal 
> editable false maxRetries -1 errorQueueEnabled false -  
> 2020-06-24 09:15:39,943 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.BookKeeper [BookKeeper.java : 144] Importing distribution 
> package myid of type DELETE at offset 0 -  
> 2020-06-24 09:15:39,945 INFO [Queue Processor for Subscriber agent sub1agent] 
> o.a.s.d.j.i.s.PackageHandler [PackageHandler.java : 107] Deleting paths 
> [/test] - retries=0, paths=/test, sub-sling-id=sub1sling, 
> module=distribution, sub-agent-name=sub1agent, pub-sling-id=pub1sling, 
> distribution-message-type=DELETE, package-id=myid, pub-agent-name=pub1agent 
> 2020-06-24 09:15:50,007 INFO [main] o.a.s.d.j.i.s.DistributionSubscriber 
> [DistributionSubscriber.java : 227] Stopped Subscriber agent sub1agent, 
> subscribed to Publisher agent names [pub1agent] with package builder journal 
> -  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9537) Emit metric for error accessing queues

2020-06-24 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9537:
--
Fix Version/s: (was: Content Distribution Journal 0.1.)
   Content Distribution Journal Core 0.1.18

> Emit metric for error accessing queues
> --
>
> Key: SLING-9537
> URL: https://issues.apache.org/jira/browse/SLING-9537
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Amit Jain
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the errors (exceptions) when getting the queues are not measurred. 
> This error measure correspond to an API entry point (e.g. SCD 
> [getQueue|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/DistributionPublisher.java#L235-L236]
>  for DistributionAgent API). It will capture errors such as failures to seed 
> caches and more generally any failure getting the queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9476) Journal IT tests fail due to wrong assumptions

2020-06-16 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9476:
--
Summary: Journal IT tests fail due to wrong assumptions  (was: Journal IT 
tests fail)

> Journal IT tests fail due to wrong assumptions
> --
>
> Key: SLING-9476
> URL: https://issues.apache.org/jira/browse/SLING-9476
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal ITs 0.1.2
>
>
> Since the change SLING-9460, some IT test fail
> https://builds.apache.org/job/Sling/job/sling-org-apache-sling-distribution-journal-it/
> AuthorDistributeTest.testDistribute



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-9504) Switch from protobuf to json

2020-06-16 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9504:
--
Fix Version/s: (was: Content Distribution Journal Messages 0.1.8)
   Content Distribution Journal Messages 0.1.10

> Switch from protobuf to json
> 
>
> Key: SLING-9504
> URL: https://issues.apache.org/jira/browse/SLING-9504
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Messages 0.1.2
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Messages 0.1.10
>
>
> Protobuf has the problem that it is difficult to diagnose. We would like to 
> switch all messages to json payload. 
> This is an incompatible change. So data of running instances will have to be 
> migrated to the new format or old messages will be lost.
> A good way for migration is to wait until the queues are empty. This way only 
> history in kafka is lost but there is no risk of inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9481) Avoid seeding messages in PackageRepo

2020-06-03 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124797#comment-17124797
 ] 

Timothee Maret commented on SLING-9481:
---

This seems to have been merged. Could it be resolved [~cschneider] ?

> Avoid seeding messages in PackageRepo
> -
>
> Key: SLING-9481
> URL: https://issues.apache.org/jira/browse/SLING-9481
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.10
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> Currently we us seeding messages to determine the oldest and newest offset on 
> the journal. This is then used to cleanup the larger packages that are stored 
> in the repo.
> To avoid those seeding messages we should rely on a much simpler algorithm.
> We only store very few packages in the repository. So there is no need to 
> clean them quickly. We only must make sure we clean packages up after they 
> are not anymore present in the journal. As retention time is maximum 7 days. 
> We can assume that 30 days is a safe time after which we can delete packages.
> So the idea is to run a cleanup in certain intervals and cleanup all packages 
> that are older than 30 days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-9505) Failure to retry distributing content

2020-06-05 Thread Timothee Maret (Jira)
Timothee Maret created SLING-9505:
-

 Summary: Failure to retry distributing content
 Key: SLING-9505
 URL: https://issues.apache.org/jira/browse/SLING-9505
 Project: Sling
  Issue Type: Bug
  Components: Content Distribution
Reporter: Timothee Maret
Assignee: Timothee Maret
 Fix For: Content Distribution Core 0.4.4


{code}
05.06.2020 20:23:33.066 *ERROR* [127.0.0.1 [1591388613038] POST 
/libs/sling/distribution/services/agents/publish-test HTTP/1.1] 
org.apache.sling.distribution.agent.impl.SimpleDistributionAgent 
[agent][publish-test] an error happened during package import
org.apache.sling.distribution.common.DistributionException: 
org.apache.http.client.ClientProtocolException
at 
org.apache.sling.distribution.transport.impl.SimpleHttpDistributionTransport.deliverPackage(SimpleHttpDistributionTransport.java:164)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.packaging.impl.importer.RemoteDistributionPackageImporter.importPackage(RemoteDistributionPackageImporter.java:69)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.agent.impl.ImportingDistributionPackageProcessor.process(ImportingDistributionPackageProcessor.java:78)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.packaging.impl.exporter.LocalDistributionPackageExporter.exportPackages(LocalDistributionPackageExporter.java:47)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.exportPackages(SimpleDistributionAgent.java:218)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.execute(SimpleDistributionAgent.java:182)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.distribution.servlet.DistributionAgentServlet.doPost(DistributionAgentServlet.java:62)
 [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
at 
org.apache.sling.api.servlets.SlingAllMethodsServlet.mayService(SlingAllMethodsServlet.java:146)
 [org.apache.sling.api:2.22.0]
at 
org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:342)
 [org.apache.sling.api:2.22.0]
at 
org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:374)
 [org.apache.sling.api:2.22.0]
at 
org.apache.sling.engine.impl.request.RequestData.service(RequestData.java:552) 
[org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.filter.SlingComponentFilterChain.render(SlingComponentFilterChain.java:44)
 [org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:82)
 [org.apache.sling.engine:2.7.2]
at 
com.day.cq.wcm.core.impl.WCMDebugFilter.doFilter(WCMDebugFilter.java:138) 
[com.day.cq.wcm.cq-wcm-core:5.13.122]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
 [org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:78)
 [org.apache.sling.engine:2.7.2]
at 
com.day.cq.wcm.core.impl.WCMComponentFilter.filterRootInclude(WCMComponentFilter.java:375)
 [com.day.cq.wcm.cq-wcm-core:5.13.122]
at 
com.day.cq.wcm.core.impl.WCMComponentFilter.doFilter(WCMComponentFilter.java:190)
 [com.day.cq.wcm.cq-wcm-core:5.13.122]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
 [org.apache.sling.engine:2.7.2]
at 
com.day.cq.wcm.core.impl.page.PageLockFilter.doFilter(PageLockFilter.java:91) 
[com.day.cq.wcm.cq-wcm-core:5.13.122]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
 [org.apache.sling.engine:2.7.2]
at 
com.day.cq.personalization.impl.TargetComponentFilter.doFilter(TargetComponentFilter.java:94)
 [com.day.cq.cq-personalization:5.13.14]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
 [org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.SlingRequestProcessorImpl.processComponent(SlingRequestProcessorImpl.java:283)
 [org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.filter.RequestSlingFilterChain.render(RequestSlingFilterChain.java:49)
 [org.apache.sling.engine:2.7.2]
at 
org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:76)
 [org.apache.sling.engine:2.7.2]
at 

[jira] [Updated] (SLING-9495) Distributed Event is sending package type as distribution type

2020-06-09 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-9495:
--
Fix Version/s: Content Distribution Journal Core 0.1.16

> Distributed Event is sending package type as distribution type
> --
>
> Key: SLING-9495
> URL: https://issues.apache.org/jira/browse/SLING-9495
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.10
>Reporter: Dirk Rudolph
>Assignee: Dirk Rudolph
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> When using journal based content distribution the distributed event contains 
> the package type instead of the distribution request type as distribution 
> type.
> See the following series of events on the DistributionPublisher (author)
> |distribution.type|ADD|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/created*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40|
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|
> |distribution.type|ADD|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/queued*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|
> |distribution.type|{color:red}default{color}|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/distributed*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40|
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9495) Distributed Event is sending package type as distribution type

2020-06-09 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129190#comment-17129190
 ] 

Timothee Maret commented on SLING-9495:
---

Good catch [~diru]!

> Distributed Event is sending package type as distribution type
> --
>
> Key: SLING-9495
> URL: https://issues.apache.org/jira/browse/SLING-9495
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.10
>Reporter: Dirk Rudolph
>Assignee: Dirk Rudolph
>Priority: Major
>
> When using journal based content distribution the distributed event contains 
> the package type instead of the distribution request type as distribution 
> type.
> See the following series of events on the DistributionPublisher (author)
> |distribution.type|ADD|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/created*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40|
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|
> |distribution.type|ADD|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/queued*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|
> |distribution.type|{color:red}default{color}|
> |distribution.component.kind|agent|
> |event.topics|*org/apache/sling/distribution/agent/package/distributed*|
> |distribution.package.id|dstrpck-159806534-03e0b77c-62fb-4781-8ec0-462a8a2e7e40|
> |distribution.paths|/content/wknd/ch/de|
> |distribution.component.name|journal|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-06-03 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reopened SLING-9482:
---

Reopening. In case the seed offset is no longer available on the journal (typ. 
because the retention policy removed it) the cache can't be seeded.

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-06-03 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125314#comment-17125314
 ] 

Timothee Maret commented on SLING-9482:
---

PR 
[#40|https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/40]
 ensures at least one seed message is created when starting a cache. This 
single seed message is sent without a corresponding consumer.

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9482) Seed the cache from offset persisted in the source repository

2020-06-03 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9482.
---
Resolution: Fixed

Merged.

> Seed the cache from offset persisted in the source repository
> -
>
> Key: SLING-9482
> URL: https://issues.apache.org/jira/browse/SLING-9482
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.16
>
>
> The approach taken in SLING-9460 to avoid sending seeding messages does not 
> address some scenarios like publishing content without subscriber agents. 
> Without subscriber agents.
> To be sure we always have a recent seed available, we should persist seed 
> offsets in the source repository (typ. author) and seed caches from it. We do 
> have the local store class that allows to easily write the offsets in the 
> repository. To not stress the repository too much, we should batch those 
> writes (e.g. 1 offset update every 10 package processed). To support a 
> cluster, the writes must be initiated only from the cluster leader instance.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9389) Distribution Event Packages should contain queue item creation time

2020-06-03 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125354#comment-17125354
 ] 

Timothee Maret commented on SLING-9389:
---

 bq. The idea is to capture a metric (using Sling Metrics) exposing the end to 
end distribution behavior (AEM to BP in this context).

[~harshchiki] I think we should do exactly what your requirement describes: add 
Sling metrics in the implementations to monitor end to end latency. I think 
that we could achieve that without extending the event API.

It's possible to dissociate metrics by agent, use case, etc. See for instance 
the 
[Bookeeper|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/4190c2befc6da8277e12ba019b8c41087cb217e1/src/main/java/org/apache/sling/distribution/journal/impl/subscriber/BookKeeper.java#L114-L115]
 which emits metrics by subscriber agent.

bq. the information would not come down to the consumers of SCD apparently.

It would come via the Sling metrics rather than the SCD API. Would that be 
satisfactory for your use case ?

> Distribution Event Packages should contain queue item creation time
> ---
>
> Key: SLING-9389
> URL: https://issues.apache.org/jira/browse/SLING-9389
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Harsh Chiki
>Assignee: Timothee Maret
>Priority: Major
> Attachments: image-2020-04-30-10-28-58-011.png, scdapi.patch, 
> scdcore.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently the Distribution Event package contains the following details:
>  * Distribution Component Name
>  * Distribution Component Kind
>  * Distribution Type
>  * Distribution Paths
>  
> Improvement aims at adding the queue item creation time, essentially when the 
> the item was creation for the first time, and enqueue into the queue. The 
> value does not change over retries (on failure).
>  
> The purpose to get this detail is to be able to capture metrics at the 
> consumer level. The consumers could have an event handler, which can capture 
> the duration which turns out to be (NOW MINUS queue item creation time thrown 
> in the distribution event package); NOW being the current time in the event 
> handler (consumer).
>  
> \cc: [~shgu...@adobe.com], [~ashishc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (SLING-9389) Distribution Event Packages should contain queue item creation time

2020-06-03 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125354#comment-17125354
 ] 

Timothee Maret edited comment on SLING-9389 at 6/3/20, 10:01 PM:
-

bq. The idea is to capture a metric (using Sling Metrics) exposing the end to 
end distribution behavior (AEM to BP in this context).

[~harshchiki] I think we should do exactly what your requirement describes: add 
Sling metrics in the implementations to monitor end to end latency. I think 
that we could achieve that without extending the event API.

It's possible to dissociate metrics by agent, use case, etc. See for instance 
the 
[Bookeeper|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/4190c2befc6da8277e12ba019b8c41087cb217e1/src/main/java/org/apache/sling/distribution/journal/impl/subscriber/BookKeeper.java#L114-L115]
 which emits metrics by subscriber agent.

bq. the information would not come down to the consumers of SCD apparently.

It would come via the Sling metrics rather than the SCD API. Would that be 
satisfactory for your use case ?


was (Author: marett):
 bq. The idea is to capture a metric (using Sling Metrics) exposing the end to 
end distribution behavior (AEM to BP in this context).

[~harshchiki] I think we should do exactly what your requirement describes: add 
Sling metrics in the implementations to monitor end to end latency. I think 
that we could achieve that without extending the event API.

It's possible to dissociate metrics by agent, use case, etc. See for instance 
the 
[Bookeeper|https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/4190c2befc6da8277e12ba019b8c41087cb217e1/src/main/java/org/apache/sling/distribution/journal/impl/subscriber/BookKeeper.java#L114-L115]
 which emits metrics by subscriber agent.

bq. the information would not come down to the consumers of SCD apparently.

It would come via the Sling metrics rather than the SCD API. Would that be 
satisfactory for your use case ?

> Distribution Event Packages should contain queue item creation time
> ---
>
> Key: SLING-9389
> URL: https://issues.apache.org/jira/browse/SLING-9389
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Harsh Chiki
>Assignee: Timothee Maret
>Priority: Major
> Attachments: image-2020-04-30-10-28-58-011.png, scdapi.patch, 
> scdcore.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently the Distribution Event package contains the following details:
>  * Distribution Component Name
>  * Distribution Component Kind
>  * Distribution Type
>  * Distribution Paths
>  
> Improvement aims at adding the queue item creation time, essentially when the 
> the item was creation for the first time, and enqueue into the queue. The 
> value does not change over retries (on failure).
>  
> The purpose to get this detail is to be able to capture metrics at the 
> consumer level. The consumers could have an event handler, which can capture 
> the duration which turns out to be (NOW MINUS queue item creation time thrown 
> in the distribution event package); NOW being the current time in the event 
> handler (consumer).
>  
> \cc: [~shgu...@adobe.com], [~ashishc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9492) Add deepPaths property to distribution events

2020-06-03 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125347#comment-17125347
 ] 

Timothee Maret commented on SLING-9492:
---

That makes sense, thanks for tracking [~diru]!

> Add deepPaths property to distribution events
> -
>
> Key: SLING-9492
> URL: https://issues.apache.org/jira/browse/SLING-9492
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution API 0.4.0
>Reporter: Dirk Rudolph
>Priority: Major
>
> Currently the DistributionEvent only supports simple paths using 
> distribution.paths but the distribution packages also support deepPaths (for 
> tree distribution like use cases). It will be useful for event consumers to 
> also consume those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9505) Failure to retry distributing content

2020-06-07 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9505.
---
Resolution: Fixed

> Failure to retry distributing content
> -
>
> Key: SLING-9505
> URL: https://issues.apache.org/jira/browse/SLING-9505
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> {code}
> 05.06.2020 20:23:33.066 *ERROR* [127.0.0.1 [1591388613038] POST 
> /libs/sling/distribution/services/agents/publish-test HTTP/1.1] 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent 
> [agent][publish-test] an error happened during package import
> org.apache.sling.distribution.common.DistributionException: 
> org.apache.http.client.ClientProtocolException
>   at 
> org.apache.sling.distribution.transport.impl.SimpleHttpDistributionTransport.deliverPackage(SimpleHttpDistributionTransport.java:164)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.packaging.impl.importer.RemoteDistributionPackageImporter.importPackage(RemoteDistributionPackageImporter.java:69)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.ImportingDistributionPackageProcessor.process(ImportingDistributionPackageProcessor.java:78)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.packaging.impl.exporter.LocalDistributionPackageExporter.exportPackages(LocalDistributionPackageExporter.java:47)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.exportPackages(SimpleDistributionAgent.java:218)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.execute(SimpleDistributionAgent.java:182)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.servlet.DistributionAgentServlet.doPost(DistributionAgentServlet.java:62)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.api.servlets.SlingAllMethodsServlet.mayService(SlingAllMethodsServlet.java:146)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:342)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:374)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.engine.impl.request.RequestData.service(RequestData.java:552)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.SlingComponentFilterChain.render(SlingComponentFilterChain.java:44)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:82)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.WCMDebugFilter.doFilter(WCMDebugFilter.java:138) 
> [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:78)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.WCMComponentFilter.filterRootInclude(WCMComponentFilter.java:375)
>  [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> com.day.cq.wcm.core.impl.WCMComponentFilter.doFilter(WCMComponentFilter.java:190)
>  [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.page.PageLockFilter.doFilter(PageLockFilter.java:91) 
> [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.personalization.impl.TargetComponentFilter.doFilter(TargetComponentFilter.java:94)
>  [com.day.cq.cq-personalization:5.13.14]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.SlingRequestProcessorImpl.processComponent(SlingRequestProcessorImpl.java:283)
>  

[jira] [Commented] (SLING-9505) Failure to retry distributing content

2020-06-07 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127544#comment-17127544
 ] 

Timothee Maret commented on SLING-9505:
---

Done in 
[a527bc26ca31640d4711b1402511dd96d8d6141b|https://github.com/apache/sling-org-apache-sling-distribution-core/commit/a527bc26ca31640d4711b1402511dd96d8d6141b].

> Failure to retry distributing content
> -
>
> Key: SLING-9505
> URL: https://issues.apache.org/jira/browse/SLING-9505
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> {code}
> 05.06.2020 20:23:33.066 *ERROR* [127.0.0.1 [1591388613038] POST 
> /libs/sling/distribution/services/agents/publish-test HTTP/1.1] 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent 
> [agent][publish-test] an error happened during package import
> org.apache.sling.distribution.common.DistributionException: 
> org.apache.http.client.ClientProtocolException
>   at 
> org.apache.sling.distribution.transport.impl.SimpleHttpDistributionTransport.deliverPackage(SimpleHttpDistributionTransport.java:164)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.packaging.impl.importer.RemoteDistributionPackageImporter.importPackage(RemoteDistributionPackageImporter.java:69)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.ImportingDistributionPackageProcessor.process(ImportingDistributionPackageProcessor.java:78)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.packaging.impl.exporter.LocalDistributionPackageExporter.exportPackages(LocalDistributionPackageExporter.java:47)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.exportPackages(SimpleDistributionAgent.java:218)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.execute(SimpleDistributionAgent.java:182)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.distribution.servlet.DistributionAgentServlet.doPost(DistributionAgentServlet.java:62)
>  [org.apache.sling.distribution.core:0.4.3.T202004281829-70214c0]
>   at 
> org.apache.sling.api.servlets.SlingAllMethodsServlet.mayService(SlingAllMethodsServlet.java:146)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:342)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.api.servlets.SlingSafeMethodsServlet.service(SlingSafeMethodsServlet.java:374)
>  [org.apache.sling.api:2.22.0]
>   at 
> org.apache.sling.engine.impl.request.RequestData.service(RequestData.java:552)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.SlingComponentFilterChain.render(SlingComponentFilterChain.java:44)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:82)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.WCMDebugFilter.doFilter(WCMDebugFilter.java:138) 
> [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:78)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.WCMComponentFilter.filterRootInclude(WCMComponentFilter.java:375)
>  [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> com.day.cq.wcm.core.impl.WCMComponentFilter.doFilter(WCMComponentFilter.java:190)
>  [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.wcm.core.impl.page.PageLockFilter.doFilter(PageLockFilter.java:91) 
> [com.day.cq.wcm.cq-wcm-core:5.13.122]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  [org.apache.sling.engine:2.7.2]
>   at 
> com.day.cq.personalization.impl.TargetComponentFilter.doFilter(TargetComponentFilter.java:94)
>  [com.day.cq.cq-personalization:5.13.14]
>   at 
> org.apache.sling.engine.impl.filter.AbstractSlingFilterChain.doFilter(AbstractSlingFilterChain.java:72)
>  

[jira] [Assigned] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-05 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-9873:
-

Assignee: Timothee Maret

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-05 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226567#comment-17226567
 ] 

Timothee Maret commented on SLING-9873:
---

[~bhardwajrahul20] thanks for the patch. It makes a lot of sense to use the 
{{:}} character as separator as it's not a valid jcr character in node names. I 
have one suggestion though. The current patch is not doing a smooth migration. 
If the new code process packages with the old separator {{,}} it will block the 
queue. I think it should be possible to extend the patch slightly such that it 
supports both the {{,}} and {{:}} separators. We could figure out the package 
separator mode by looking at the first separation between {{DSTRPCK}} and the 
operation (e.g. {{DELETE}}).

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-05 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-9873.
---
Resolution: Fixed

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-05 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226669#comment-17226669
 ] 

Timothee Maret commented on SLING-9873:
---

Thanks [~bhardwajrahul20], merged your PR.

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-05 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226680#comment-17226680
 ] 

Timothee Maret commented on SLING-9873:
---

Indeed, the packages created with the comma separator would not have been 
handled properly when being processed by the new code that expects a colon 
separator. Deployments that are upgraded while there are packages in the queue 
would have faced this issue. 

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-9873) A comma in node name causes Sling Content Distribution to fail

2020-11-04 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225941#comment-17225941
 ] 

Timothee Maret commented on SLING-9873:
---

Thanks [~bhardwajrahul20] for reporting this. Would you provide a patch ?

> A comma in node name causes Sling Content Distribution to fail
> --
>
> Key: SLING-9873
> URL: https://issues.apache.org/jira/browse/SLING-9873
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Rahul Bhardwaj
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> Sling content distribution uses comma as a path delimiter [0]but comma is a 
> valid jcr character name and hence must not be used as a path delimiter. 
> Usage of a comma in name breaks Delete operation of forward replication. 
> [0] - 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/packaging/impl/SimpleDistributionPackage.java#L101



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10067) Prototype for chunked deep distribution

2021-01-17 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266880#comment-17266880
 ] 

Timothee Maret commented on SLING-10067:


Thanks [~cschneider] for starting this prototype. The idea of using Sling Jobs 
looks appropriate for this use case. There will be only one Sling Job created 
per deep distribution action which won't stress the repository. It also makes 
sense to use the built in support to report progress and cancel jobs.

The feature would leverage APIs widely available in the Sling / AEM world: 
Sling Jobs, Sling Content Distribution and Jackrabbit FileVault. It could serve 
as a better implementation for AEM "tree activation" feature for on premise and 
cloud deployments. Toward this goal, the recursive descent should only add the 
path of hierarchical nodes (node of type nt:hierarchyNode such as sling:Folder 
or AEM's cq:Page, dam:Asset types) in the multi-path distribution requests.
{quote}I plan to create the prototype in the sling whiteboard.
{quote}
Nice! I would be glad to have a look.

> Prototype for chunked deep distribution
> ---
>
> Key: SLING-10067
> URL: https://issues.apache.org/jira/browse/SLING-10067
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
>
> We have a case where a tree distribution could not be applied on oak level as 
> it was too large. As we can not control how large a tree can grow we should 
> have a solution that does not depend on the size of the tree.
> I would like to create a prototype of a chunked deep distribution. It is 
> given a path to distribute and walks through the full tree of resources to 
> collect paths (breadth first).
> Then the list of paths is split into chunks of configurable size. For each 
> chunk we create a distribution with all paths of the chunk. This makes sure 
> the package size will not grow too big.
> The call should be asyncronous and it should be possible to monitor and 
> cancel the progress of package creation. So the idea is to use a sling job 
> with JobExecutor interface. This way the job can report progress and react to 
> cancel requests.
> I plan to create the prototype in the sling whiteboard. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-10066) cleanup logging for content distribution

2021-01-16 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-10066:
--

Assignee: (was: Timothee Maret)

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Priority: Major
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 do DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10066) cleanup logging for content distribution

2021-01-16 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266685#comment-17266685
 ] 

Timothee Maret commented on SLING-10066:


bq. I would suggest to change the loglevel for message 3 do DEBUG

+1 

bq. I would also suggest to change message 4 to DEBUG (the fact that the 
package is successfully distributed is implicit, because otherwise there would 
be an error message)

The message 4 is required to inform when distribution is done, processing is 
async and latency can vary significantly. I'd rather set the message 2 at debug 
level.

[~joerghoh] would you open a PR ?

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Priority: Major
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 do DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-10066) cleanup logging for content distribution

2021-01-16 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-10066:
--

Assignee: Timothee Maret

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Assignee: Timothee Maret
>Priority: Major
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 do DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-10066) cleanup logging for content distribution

2021-01-18 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-10066.

Fix Version/s: Content Distribution Journal Core 0.1.18
   Resolution: Fixed

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Assignee: Timothée Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 to DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10066) cleanup logging for content distribution

2021-01-18 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267450#comment-17267450
 ] 

Timothee Maret commented on SLING-10066:


Thanks [~joerghoh], I merged your PR.

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Assignee: Timothée Maret
>Priority: Major
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 to DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-10066) cleanup logging for content distribution

2021-01-18 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-10066:
--

Assignee: Timothée Maret

> cleanup logging for content distribution
> 
>
> Key: SLING-10066
> URL: https://issues.apache.org/jira/browse/SLING-10066
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Assignee: Timothée Maret
>Priority: Major
>
> When submitting content for distribution I get these 4 log entries for a 
> single submission (using the package-id as identifier):
> {code:java}
> 15.01.2021 14:30:05.168 *INFO* [192.147.128.10 [1610721005131] POST 
> /bin/replicate.json HTTP/1.1] 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [f4fce37b-f160-40b0-9284-7b696a954630] for 
> package [dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483], length 
> [8343]
> 15.01.2021 14:30:05.243 *INFO* [Message Poller PackageMessage handled by 
> MessagingCacheCallback$$Lambda$532/0x00080108fc40] 
> org.apache.sling.distribution.journal.queue.impl.PubQueueCache Queueing 
> message 
> package-id=dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, 
> offset=1023083
> 15.01.2021 14:30:17.716 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.PackageDistributedNotifier
>  Sending distributed notifications for pub agent publish queue item 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483
> 15.01.2021 14:30:17.779 *INFO* [sling-default-1-Registered Service.5223] 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher 
> [null] Succesfully applied package with id 
> dstrpck-1610721005168-00bd582b-1f91-4abc-82b6-a80afa2a0483, type ADD, paths 
> [/content/something]
> {code}
> While they are technically perfectly ok, I think that 4 messages are a bit 
> too much. I would suggest to change the loglevel for message 3 to DEBUG, 
> because it does not offer any useful information, which should be logged on 
> INFO.
>  I would also suggest to change message 4 to DEBUG (the fact that the package 
> is successfully distributed is implicit, because otherwise there would be an 
> error message), but I am indeed interested into the path which is 
> distributed. So it would be great if we could log that path(s) in either in 
> message 1 or 2.
> Overall this could reduce the noise in the logs if distribution is heavily 
> used.
> WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-10077) Mode to raise events locally only

2021-01-20 Thread Timothee Maret (Jira)
Timothee Maret created SLING-10077:
--

 Summary: Mode to raise events locally only
 Key: SLING-10077
 URL: https://issues.apache.org/jira/browse/SLING-10077
 Project: Sling
  Issue Type: Improvement
Affects Versions: Content Distribution Journal Core 0.1.18
Reporter: Timothee Maret


Currently the `o/a/s/d/a/p/queued` and `o/a/s/d/a/p/distributed` events are 
raised as local events in each instance of a cluster. Raising the events in the 
cluster make sense for use cases that require maintaining a distributed data 
structure like a cache.

Some use case trigger logic on the cluster (not individual instances) and would 
benefit from raising a single event.

As an improvement, we could add a mode to distribute or not distribute all 
journal distribution events. Distribution would be enabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-10077) Mode to raise events only locally

2021-01-20 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10077:
---
Summary: Mode to raise events only locally  (was: Mode to raise events 
locally only)

> Mode to raise events only locally
> -
>
> Key: SLING-10077
> URL: https://issues.apache.org/jira/browse/SLING-10077
> Project: Sling
>  Issue Type: Improvement
>Affects Versions: Content Distribution Journal Core 0.1.18
>Reporter: Timothee Maret
>Priority: Major
>
> Currently the `o/a/s/d/a/p/queued` and `o/a/s/d/a/p/distributed` events are 
> raised as local events in each instance of a cluster. Raising the events in 
> the cluster make sense for use cases that require maintaining a distributed 
> data structure like a cache.
> Some use case trigger logic on the cluster (not individual instances) and 
> would benefit from raising a single event.
> As an improvement, we could add a mode to distribute or not distribute all 
> journal distribution events. Distribution would be enabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10108) PackageMessageFactory can allocate huge amount of memory

2021-01-31 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276108#comment-17276108
 ] 

Timothee Maret commented on SLING-10108:


The packages are binary-less. When distributing resources one by one, it is not 
a problem. It can be a problem indeed when doing deep distribution with large 
tree. Chunked distributed in SLING-10067 will mitigate that issue but we should 
indeed stream the package to solve it in all cases. Would you provide a patch ?

> PackageMessageFactory can allocate huge amount of memory
> 
>
> Key: SLING-10108
> URL: https://issues.apache.org/jira/browse/SLING-10108
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Priority: Major
>
> The PackageMessageFactory consumes the complete binary stream of a 
> distribution package into a single byte array [1]; and depending on the size 
> of the  package this can cause severe memory issues with a potential OOM of 
> the JVM. And even in cases where it's not running into OOM, it can cause 
> major work of the GC to provide a contingous block of heap at that size.
> In logs of existing environments I have seen values up to 390MiB:
> {noformat}
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory 
> Creating package binary with id [8702eb49-cf1b-4353-9c3d-67d59f7414f7] for 
> package [dstrpck-1611876137443-cabe281b-4399-46a1-9943-a8c6358c6da2], length 
> [393722362]
> {noformat}
> The logic should be changed, so that the package is not stored within memory, 
> but rather just streamed (if necessary at all).
>  
> [1] 
> [https://github.com/apache/sling-org-apache-sling-distribution-journal/blob/master/src/main/java/org/apache/sling/distribution/journal/impl/publisher/PackageMessageFactory.java#L96]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10112) Distribution Queue can not be reliably cleared due to race condition processing delete messages

2021-02-02 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276984#comment-17276984
 ] 

Timothee Maret commented on SLING-10112:


For the record, the non heuristic way to solve this would be to interrupt the 
Session#save operation but that is not possible. The JCR API does not cover 
interruptions and Apache Oak does not support Tread.interrupt 
https://jackrabbit.apache.org/oak/docs/dos_and_donts.html

> Distribution Queue can not be reliably cleared due to race condition 
> processing delete messages
> ---
>
> Key: SLING-10112
> URL: https://issues.apache.org/jira/browse/SLING-10112
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
>
> Replication queue can not be reliable cleared due to race condition with 
> processing the 'delete' messages.
> If the first message is stuck during processing, the 'clear queue' signal 
> might never have effect on the subscriber.
> A restart of the process does not help as the same situation might occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10112) Distribution Queue can not be reliably cleared due to race condition processing delete messages

2021-02-02 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276956#comment-17276956
 ] 

Timothee Maret commented on SLING-10112:


Sending commands and importing packages are required to be two independent and 
parallel processes.

The approach in the PR #65 postpones the package consumer initialisation to 
when the command consumer is idle. The idle check is an heuristic. When the 
heuristic fails, new attempts can be triggered by restarting the process. Using 
an heuristic here seems appropriate considering the alternative would require 
persisting yet another state in the repository. Looking at the PR, I think we 
should make sure the idle check times out eventually.

> Distribution Queue can not be reliably cleared due to race condition 
> processing delete messages
> ---
>
> Key: SLING-10112
> URL: https://issues.apache.org/jira/browse/SLING-10112
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
>
> Replication queue can not be reliable cleared due to race condition with 
> processing the 'delete' messages.
> If the first message is stuck during processing, the 'clear queue' signal 
> might never have effect on the subscriber.
> A restart of the process does not help as the same situation might occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10123) Distribution agent queue processor should implement a backoff in case of retries for processing an item

2021-02-04 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278655#comment-17278655
 ] 

Timothee Maret commented on SLING-10123:


That makes sense. [~mohiaror] would you open a PR ?

> Distribution agent queue processor should implement a backoff in case of 
> retries for processing an item
> ---
>
> Key: SLING-10123
> URL: https://issues.apache.org/jira/browse/SLING-10123
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Mohit Arora
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> In case of recoverable exceptions, distribution agent queue processor does 
> not evict the queue item from the processing queue [0]. Rather, the item is 
> retried infinitely until either the distribution of the item is successful or 
> a non-recoverable exception is thrown for the item. However, since there is 
> "something wrong" because of which an exception is thrown in the first place, 
> we should add a cool off period before trying to reattempt to distribute the 
> same item. This can be achieved through a linear or exponential backoff.
> cc - [~ashishc]
> [0] 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/agent/impl/SimpleDistributionAgentQueueProcessor.java#L147-L150



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-4075) Improve test coverage of SCD

2021-01-26 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-4075.
---
Resolution: Fixed

> Improve test coverage of SCD
> 
>
> Key: SLING-4075
> URL: https://issues.apache.org/jira/browse/SLING-4075
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Tommaso Teofili
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> Currently we moved lots of testing to the IT module but it'd be good to have 
> a better test coverage via unit testing in core module, at least to test 
> basic use cases and maybe some edge cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-4075) Improve test coverage of SCD

2021-01-26 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272035#comment-17272035
 ] 

Timothee Maret commented on SLING-4075:
---

This issue has been kept open for a long time and should be closed. SCD is a 
large multi-module project and test coverage has been increased as part of this 
issue and as part of other contributions. We should re-open more targeted 
issues to cover the components that remain uncovered.

> Improve test coverage of SCD
> 
>
> Key: SLING-4075
> URL: https://issues.apache.org/jira/browse/SLING-4075
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Tommaso Teofili
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Core 0.4.4
>
>
> Currently we moved lots of testing to the IT module but it'd be good to have 
> a better test coverage via unit testing in core module, at least to test 
> basic use cases and maybe some edge cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10088) PatternSyntaxException: Unclosed group near index x

2021-01-30 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275763#comment-17275763
 ] 

Timothee Maret commented on SLING-10088:


Opened a PR 
https://github.com/apache/sling-org-apache-sling-distribution-core/pull/47 to 
apply the workaround in Sling Distribution Core bundle. [~cschneider] I think 
we indeed need to cancel the release as JCRVLT-500 may take a while.

> PatternSyntaxException: Unclosed group near index x
> ---
>
> Key: SLING-10088
> URL: https://issues.apache.org/jira/browse/SLING-10088
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Critical
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> if a path contains a "(" the following exception will occur when we try to 
> distribute it.
> As far as I can tell the problem is a missing escaping of the path in 
> org.apache.sling.distribution.serialization.impl.vlt.VltUtils.createFilter.
> {code:java}
> [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.createAdd(PackageMessageFactory.java:95)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.create(PackageMessageFactory.java:86)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.lambda$execute$1(DistributionPublisher.java:271)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.shared.DistributionMetricsService.timed(DistributionMetricsService.java:147)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:270)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:259)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.impl.DefaultDistributor.distribute(DefaultDistributor.java:60)
>  [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distributeChunk(ChunkedDistribution.java:124)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distribute(ChunkedDistribution.java:102)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.process(ChunkedDistribution.java:72)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJob(JobQueueImpl.java:293)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.access$100(JobQueueImpl.java:60)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl$1.run(JobQueueImpl.java:229)
>  [org.apache.sling.event:4.2.12]    at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.util.regex.PatternSyntaxException: Unclosed group near index 
> 20/nodewith(shouldwork    at 
> java.base/java.util.regex.Pattern.error(Pattern.java:2027)    at 
> java.base/java.util.regex.Pattern.accept(Pattern.java:1877)    at 
> java.base/java.util.regex.Pattern.group0(Pattern.java:3060)    at 
> java.base/java.util.regex.Pattern.sequence(Pattern.java:2123)    at 
> java.base/java.util.regex.Pattern.expr(Pattern.java:2068)    at 
> java.base/java.util.regex.Pattern.compile(Pattern.java:1782)    at 
> java.base/java.util.regex.Pattern.(Pattern.java:1428)    at 
> java.base/java.util.regex.Pattern.compile(Pattern.java:1068)    at 
> org.apache.jackrabbit.vault.fs.filter.DefaultPathFilter.setPattern(DefaultPathFilter.java:68)
>  [org.apache.jackrabbit.vault:3.4.0]    at 
> org.apache.jackrabbit.vault.fs.filter.DefaultPathFilter.(DefaultPathFilter.java:48)
>  [org.apache.jackrabbit.vault:3.4.0]    at 
> 

[jira] [Commented] (SLING-10088) PatternSyntaxException: Unclosed group near index x

2021-01-30 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275752#comment-17275752
 ] 

Timothee Maret commented on SLING-10088:


Opened JCRVLT-500 to track the issue in Apache Jackrabbit FileVault project

> PatternSyntaxException: Unclosed group near index x
> ---
>
> Key: SLING-10088
> URL: https://issues.apache.org/jira/browse/SLING-10088
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Critical
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> if a path contains a "(" the following exception will occur when we try to 
> distribute it.
> As far as I can tell the problem is a missing escaping of the path in 
> org.apache.sling.distribution.serialization.impl.vlt.VltUtils.createFilter.
> {code:java}
> [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.createAdd(PackageMessageFactory.java:95)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.create(PackageMessageFactory.java:86)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.lambda$execute$1(DistributionPublisher.java:271)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.shared.DistributionMetricsService.timed(DistributionMetricsService.java:147)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:270)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:259)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.impl.DefaultDistributor.distribute(DefaultDistributor.java:60)
>  [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distributeChunk(ChunkedDistribution.java:124)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distribute(ChunkedDistribution.java:102)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.process(ChunkedDistribution.java:72)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJob(JobQueueImpl.java:293)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.access$100(JobQueueImpl.java:60)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl$1.run(JobQueueImpl.java:229)
>  [org.apache.sling.event:4.2.12]    at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.util.regex.PatternSyntaxException: Unclosed group near index 
> 20/nodewith(shouldwork    at 
> java.base/java.util.regex.Pattern.error(Pattern.java:2027)    at 
> java.base/java.util.regex.Pattern.accept(Pattern.java:1877)    at 
> java.base/java.util.regex.Pattern.group0(Pattern.java:3060)    at 
> java.base/java.util.regex.Pattern.sequence(Pattern.java:2123)    at 
> java.base/java.util.regex.Pattern.expr(Pattern.java:2068)    at 
> java.base/java.util.regex.Pattern.compile(Pattern.java:1782)    at 
> java.base/java.util.regex.Pattern.(Pattern.java:1428)    at 
> java.base/java.util.regex.Pattern.compile(Pattern.java:1068)    at 
> org.apache.jackrabbit.vault.fs.filter.DefaultPathFilter.setPattern(DefaultPathFilter.java:68)
>  [org.apache.jackrabbit.vault:3.4.0]    at 
> org.apache.jackrabbit.vault.fs.filter.DefaultPathFilter.(DefaultPathFilter.java:48)
>  [org.apache.jackrabbit.vault:3.4.0]    at 
> org.apache.sling.distribution.serialization.impl.vlt.VltUtils.createFilter(VltUtils.java:92)
>  [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> 

[jira] [Commented] (SLING-10088) PatternSyntaxException: Unclosed group near index x

2021-01-30 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275712#comment-17275712
 ] 

Timothee Maret commented on SLING-10088:


PR 
[#46|https://github.com/apache/sling-org-apache-sling-distribution-core/pull/46]
 changed the way FileVault [DefaultPathFilter#isAbsolute 
|https://github.com/apache/jackrabbit-filevault/blob/master/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/filter/DefaultPathFilter.java#L99-L101]
 and 
[DefaultPathFilter#translate|https://github.com/apache/jackrabbit-filevault/blob/master/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/filter/DefaultPathFilter.java#L107-L121]
 quake. Those methods decide if a path is absolute by checking if the pattern 
starts with a {{/}}. This implementation fails when providing a quoted path 
like in PR 
[#46|https://github.com/apache/sling-org-apache-sling-distribution-core/pull/46].
 This should be fixed in FileVault by using a matcher to decide absolute path 
instead of comparing the pattern string.

We could work around this issue in Sling while still quoting specials chars 
(this issue) with something along the lines of

{code}
if (path.startsWith("/")) {
  return "/" + Pattern.quote(path.substring(1));
} else {
  return Pattern.quote(path);
}
{code}


> PatternSyntaxException: Unclosed group near index x
> ---
>
> Key: SLING-10088
> URL: https://issues.apache.org/jira/browse/SLING-10088
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Affects Versions: Content Distribution Core 0.4.2
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Critical
> Fix For: Content Distribution Core 0.4.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> if a path contains a "(" the following exception will occur when we try to 
> distribute it.
> As far as I can tell the problem is a missing escaping of the path in 
> org.apache.sling.distribution.serialization.impl.vlt.VltUtils.createFilter.
> {code:java}
> [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.createAdd(PackageMessageFactory.java:95)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.PackageMessageFactory.create(PackageMessageFactory.java:86)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.lambda$execute$1(DistributionPublisher.java:271)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.shared.DistributionMetricsService.timed(DistributionMetricsService.java:147)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:270)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.journal.impl.publisher.DistributionPublisher.execute(DistributionPublisher.java:259)
>  [org.apache.sling.distribution.journal:0.2.0.T202009251421-0284693]    
> at 
> org.apache.sling.distribution.impl.DefaultDistributor.distribute(DefaultDistributor.java:60)
>  [org.apache.sling.distribution.core:0.4.3.T20200720-c96d3fb]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distributeChunk(ChunkedDistribution.java:124)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.distribute(ChunkedDistribution.java:102)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.distribution.chunked.ChunkedDistribution.process(ChunkedDistribution.java:72)
>  [org.apache.sling.distribution.chunked:0.1.0.20210121164255060]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJob(JobQueueImpl.java:293)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl.access$100(JobQueueImpl.java:60)
>  [org.apache.sling.event:4.2.12]    at 
> org.apache.sling.event.impl.jobs.queues.JobQueueImpl$1.run(JobQueueImpl.java:229)
>  [org.apache.sling.event:4.2.12]    at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.util.regex.PatternSyntaxException: Unclosed group near index 
> 20/nodewith(shouldwork    

[jira] [Commented] (SLING-10107) Refactoring of creation of ResourceResolvers

2021-01-30 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275685#comment-17275685
 ] 

Timothee Maret commented on SLING-10107:


bq. ResourceResolvers are opened for obviously tiny operations is quite high.

Is this a problem ? IINM it's a best practice to use short many short lived 
sessions rather than few long ones. Sharing sessions is not easy and can lead 
to many issues due to potentially I. getting a session with existing transient 
data, II. getting a session which is not up to date with the latest repository 
state, III. end up sharing the same session concurrently from different threads.

Unless we see that the current behaviour is a source of problem, I am strongly 
against reusing sessions.

We could, however, centralise the creation of new sessions and attach a metric 
to it to help get stats.

> Refactoring of creation of ResourceResolvers
> 
>
> Key: SLING-10107
> URL: https://issues.apache.org/jira/browse/SLING-10107
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Jörg Hoh
>Priority: Major
>
> In the 2 classes {{Bookkepper}} and the {{Localstore}} the creation of 
> ResourceResolvers is spread over multiple places, and also the number of 
> methods, where ResourceResolvers are opened for obviously tiny operations is 
> quite high.
> Therefor I refactored the creation of ResourceResolvers for the subservice 
> bookkeeper (which is the majority of cases) and centralized the creation of 
> it into a central supplier within the {{Bookkeeper}}. I also added a new 
> Gauge metric to observe how often a new ResourceResolver is actually created.
> My gut feeling is, that the number of creations is (too) high; but before I 
> start combining operations and reduce the number of ResourceResolver 
> creations, I would like to understand if it's a problem at all.
> See 
> https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/63 
> for the PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10384) Execute actions before sending "o/a/s/d/i/package/imported" event

2021-06-16 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364123#comment-17364123
 ] 

Timothee Maret commented on SLING-10384:


[~joerghoh] As we are heading towards a service API for this hook, we should 
remove the event introduced in distribution API via this commit 
https://github.com/apache/sling-org-apache-sling-distribution-api/commit/915a638ba3be7ed5ef36ddb0d3daa1d530217833

> Execute actions before sending "o/a/s/d/i/package/imported" event
> -
>
> Key: SLING-10384
> URL: https://issues.apache.org/jira/browse/SLING-10384
> Project: Sling
>  Issue Type: New Feature
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Joerg Hoh
>Assignee: Joerg Hoh
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18, Content 
> Distribution API 0.5.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After the package has been imported and before the event 
> "org/apache/sling/distribution/importer/package/imported" event is sent, 
> custom functionality should be executed synchronously. It should have access 
> to the same type of information which is sent with the event.
> This allows to execute actions which before any handlers of the "imported" 
> event.
> On the implementation side we will do a {{eventadmin.sendEvent()}} and add 
> the same payload as with the "package/imported" event, but under the topic 
> {{o/a/s/d/i/package/committed}} . This will allow the synchronous executions 
> of event handlers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-10527.

Resolution: Fixed

> Log imported instead of importing package
> -
>
> Key: SLING-10527
> URL: https://issues.apache.org/jira/browse/SLING-10527
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the package import is logged at INFO level before processing the 
> package
> {code}
> log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
> {code}
> This makes it difficult to figure out if there was an error with the import 
> because one has to seek further logs to see if the package import failed or 
> (worst) no log if the package was imported successfully. We should ensure 
> that a single log statement is logged upon import, either the import 
> succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367635#comment-17367635
 ] 

Timothee Maret commented on SLING-10527:


Merged PR #73

> Log imported instead of importing package
> -
>
> Key: SLING-10527
> URL: https://issues.apache.org/jira/browse/SLING-10527
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the package import is logged at INFO level before processing the 
> package
> {code}
> log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
> {code}
> This makes it difficult to figure out if there was an error with the import 
> because one has to seek further logs to see if the package import failed or 
> (worst) no log if the package was imported successfully. We should ensure 
> that a single log statement is logged upon import, either the import 
> succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-10529) Remove MDC constructs

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-10529.

Resolution: Fixed

Merged PR #74

> Remove MDC constructs
> -
>
> Key: SLING-10529
> URL: https://issues.apache.org/jira/browse/SLING-10529
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Minor
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> BookKeeper sets up MDC which are not used. We should remove them and rely on 
> consistent logging instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367661#comment-17367661
 ] 

Timothee Maret commented on SLING-10528:


Opened 
[PR#75|https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/75]
 that limits packages to 5MB. In my test environments, it takes around 1 minute 
to import 5MB binary-less package worth of assets. 

Packages larger than 5MB will fail upon import and will be handled via the 
standard procedures, either blocking the queue or moving it in a quarantine 
queue.

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-23 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367993#comment-17367993
 ] 

Timothee Maret commented on SLING-10528:


[~cschneider] good point. I thought of handling the case in the subscriber as 
it would also catch packages already produced. Given that the journal removes 
messages after 7 days, we can realistically ignore that case.

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-23 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367994#comment-17367994
 ] 

Timothee Maret commented on SLING-10528:


Updated the PR with package length assertions on the distribution publisher 
side.

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10112) Distribution Queue can not be reliably cleared due to race condition processing delete messages

2021-06-23 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368001#comment-17368001
 ] 

Timothee Maret commented on SLING-10112:


This approach has a flaw with cloud deployments that treat Sling instances as a 
cattle. In those deployments, each instances starts with a new slingId. 
Commands are sent to a specific subscriber agent identified by subSlingId and 
subAgentName. As a result, the command is not being picked up when the instance 
is replaced.

We should revert this change given that SLING-10528 avoids the condition that 
required this change.

> Distribution Queue can not be reliably cleared due to race condition 
> processing delete messages
> ---
>
> Key: SLING-10112
> URL: https://issues.apache.org/jira/browse/SLING-10112
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Replication queue can not be reliable cleared due to race condition with 
> processing the 'delete' messages.
> If the first message is stuck during processing, the 'clear queue' signal 
> might never have effect on the subscriber.
> A restart of the process does not help as the same situation might occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-23 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368000#comment-17368000
 ] 

Timothee Maret commented on SLING-10528:


The mechanism of rejecting packages by size supersedes SLING-10112. 

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (SLING-10112) Distribution Queue can not be reliably cleared due to race condition processing delete messages

2021-06-23 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reopened SLING-10112:


> Distribution Queue can not be reliably cleared due to race condition 
> processing delete messages
> ---
>
> Key: SLING-10112
> URL: https://issues.apache.org/jira/browse/SLING-10112
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Replication queue can not be reliable cleared due to race condition with 
> processing the 'delete' messages.
> If the first message is stuck during processing, the 'clear queue' signal 
> might never have effect on the subscriber.
> A restart of the process does not help as the same situation might occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (SLING-10528) Reject large packages

2021-06-23 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret resolved SLING-10528.

Resolution: Fixed

Squashed and merged PR #75

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)
Timothee Maret created SLING-10527:
--

 Summary: Log imported instead of importing package
 Key: SLING-10527
 URL: https://issues.apache.org/jira/browse/SLING-10527
 Project: Sling
  Issue Type: Bug
  Components: Content Distribution
Reporter: Timothee Maret
 Fix For: Content Distribution Journal Core 0.1.18


Currently the package import is logged at INFO level before processing the 
package
{code}
log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
{code}

This makes it difficult to figure out if there was an error with the import 
because one has to seek further logs to see if the package import failed or 
(worst) no log if the package was imported successfully. We should ensure that 
a single log statement is logged upon import, either the import succeeded or 
failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10527:
---
Comment: was deleted

(was: 
https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/73)

> Log imported instead of importing package
> -
>
> Key: SLING-10527
> URL: https://issues.apache.org/jira/browse/SLING-10527
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the package import is logged at INFO level before processing the 
> package
> {code}
> log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
> {code}
> This makes it difficult to figure out if there was an error with the import 
> because one has to seek further logs to see if the package import failed or 
> (worst) no log if the package was imported successfully. We should ensure 
> that a single log statement is logged upon import, either the import 
> succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367278#comment-17367278
 ] 

Timothee Maret commented on SLING-10527:


https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/73

> Log imported instead of importing package
> -
>
> Key: SLING-10527
> URL: https://issues.apache.org/jira/browse/SLING-10527
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the package import is logged at INFO level before processing the 
> package
> {code}
> log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
> {code}
> This makes it difficult to figure out if there was an error with the import 
> because one has to seek further logs to see if the package import failed or 
> (worst) no log if the package was imported successfully. We should ensure 
> that a single log statement is logged upon import, either the import 
> succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)
Timothee Maret created SLING-10528:
--

 Summary: Reject large packages
 Key: SLING-10528
 URL: https://issues.apache.org/jira/browse/SLING-10528
 Project: Sling
  Issue Type: Bug
  Components: Content Distribution
Reporter: Timothee Maret
Assignee: Timothee Maret
 Fix For: Content Distribution Journal Core 0.1.18


Currently the package import is logged at INFO level before processing the 
package
{code}
log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
{code}

This makes it difficult to figure out if there was an error with the import 
because one has to seek further logs to see if the package import failed or 
(worst) no log if the package was imported successfully. We should ensure that 
a single log statement is logged upon import, either the import succeeded or 
failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (SLING-10527) Log imported instead of importing package

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret reassigned SLING-10527:
--

Assignee: Timothee Maret

> Log imported instead of importing package
> -
>
> Key: SLING-10527
> URL: https://issues.apache.org/jira/browse/SLING-10527
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Currently the package import is logged at INFO level before processing the 
> package
> {code}
> log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
> {code}
> This makes it difficult to figure out if there was an error with the import 
> because one has to seek further logs to see if the package import failed or 
> (worst) no log if the package was imported successfully. We should ensure 
> that a single log statement is logged upon import, either the import 
> succeeded or failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10528:
---
Description: 
Journal distribution serialise content packages in binary less mode only. When 
doing "deep" distribution, the package can still become so large that the 
importing side can't ingest it. When this happens, Apache Oak does a session 
save that never seem to return and can easily take 8 hours to terminate.

I suggest to detect those large packages, based on the package size and simply 
reject packages above a  configurable limit size. The limit should take in 
consideration the mean import throughput for a single Oak session on cloud 
segment tar and should keep the save operation below, say 15 minutes.

  was:
Currently the package import is logged at INFO level before processing the 
package
{code}
log.info("Importing distribution package {} at offset={}", pkgMsg, offset);
{code}

This makes it difficult to figure out if there was an error with the import 
because one has to seek further logs to see if the package import failed or 
(worst) no log if the package was imported successfully. We should ensure that 
a single log statement is logged upon import, either the import succeeded or 
failed.


> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367442#comment-17367442
 ] 

Timothee Maret commented on SLING-10528:


bq. if we can simple put a useful limit into the code or if we need a 
configuration.

We can hardcode the threshold and add the configuration later. The threshold is 
not expected to be configured that much because we are serialising in 
binary-less mode. It'd have been different with full binaries.

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10488) Also use path for LocalStore node name

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367443#comment-17367443
 ] 

Timothee Maret commented on SLING-10488:


[~cschneider] is this fixed ?

> Also use path for LocalStore node name
> --
>
> Key: SLING-10488
> URL: https://issues.apache.org/jira/browse/SLING-10488
> Project: Sling
>  Issue Type: Improvement
>  Components: Content Distribution
>Affects Versions: Content Distribution Journal Core 0.1.16
>Reporter: Christian Schneider
>Assignee: Christian Schneider
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Sometimes we use the same server hostname on different clusters. We can use 
> the path of the serverUri to distinguish these.
> This change adds the path to the LocalStore node name so offsets are stored 
> uniquely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10528:
---
Description: 
Journal distribution serialise content packages in binary less mode only. When 
doing "deep" distribution, the package can still become so large that the 
importing side can't ingest it. When this happens, Apache Oak does a session 
save that never seem to return and can easily take 8 hours to terminate.

I suggest to detect those large packages, based on the package size and simply 
reject packages above a  configurable limit size. The limit should take in 
consideration the mean import throughput for a single Oak session on cloud 
segment tar and should keep the save operation below, say 15 minutes.

With this approach, we can ensure that Oak always return when importing a 
package and we can fail fast for larger packages. Journal distribution will 
also handle the errors nicely and allow to remove the blocking item from the 
usual set of tools. 

  was:
Journal distribution serialise content packages in binary less mode only. When 
doing "deep" distribution, the package can still become so large that the 
importing side can't ingest it. When this happens, Apache Oak does a session 
save that never seem to return and can easily take 8 hours to terminate.

I suggest to detect those large packages, based on the package size and simply 
reject packages above a  configurable limit size. The limit should take in 
consideration the mean import throughput for a single Oak session on cloud 
segment tar and should keep the save operation below, say 15 minutes.


> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-10529) Remove MDC constructs

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10529:
---
Description: BookKeeper sets up MDC which are not used. We should remove 
them and rely on consistent logging instead.  (was: Journal distribution 
serialise content packages in binary less mode only. When doing "deep" 
distribution, the package can still become so large that the importing side 
can't ingest it. When this happens, Apache Oak does a session save that never 
seem to return and can easily take 8 hours to terminate.

I suggest to detect those large packages, based on the package size and simply 
reject packages above a  configurable limit size. The limit should take in 
consideration the mean import throughput for a single Oak session on cloud 
segment tar and should keep the save operation below, say 15 minutes.

With this approach, we can ensure that Oak always return when importing a 
package and we can fail fast for larger packages. Journal distribution will 
also handle the errors nicely and allow to remove the blocking item from the 
usual set of tools. )

> Remove MDC constructs
> -
>
> Key: SLING-10529
> URL: https://issues.apache.org/jira/browse/SLING-10529
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> BookKeeper sets up MDC which are not used. We should remove them and rely on 
> consistent logging instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SLING-10529) Remove MDC constructs

2021-06-22 Thread Timothee Maret (Jira)


 [ 
https://issues.apache.org/jira/browse/SLING-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothee Maret updated SLING-10529:
---
Priority: Minor  (was: Major)

> Remove MDC constructs
> -
>
> Key: SLING-10529
> URL: https://issues.apache.org/jira/browse/SLING-10529
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Minor
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> BookKeeper sets up MDC which are not used. We should remove them and rely on 
> consistent logging instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SLING-10528) Reject large packages

2021-06-22 Thread Timothee Maret (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367352#comment-17367352
 ] 

Timothee Maret commented on SLING-10528:


Thanks [~kwin]! Indeed, we could leverage auto save threshold but as you 
pointed out we'd loose atomic commits. In addition we have seen use cases where 
the threshold was not enforced due to missing dependencies. This latter issue 
may be fixed eventually.

> Reject large packages
> -
>
> Key: SLING-10528
> URL: https://issues.apache.org/jira/browse/SLING-10528
> Project: Sling
>  Issue Type: Bug
>  Components: Content Distribution
>Reporter: Timothee Maret
>Assignee: Timothee Maret
>Priority: Major
> Fix For: Content Distribution Journal Core 0.1.18
>
>
> Journal distribution serialise content packages in binary less mode only. 
> When doing "deep" distribution, the package can still become so large that 
> the importing side can't ingest it. When this happens, Apache Oak does a 
> session save that never seem to return and can easily take 8 hours to 
> terminate.
> I suggest to detect those large packages, based on the package size and 
> simply reject packages above a  configurable limit size. The limit should 
> take in consideration the mean import throughput for a single Oak session on 
> cloud segment tar and should keep the save operation below, say 15 minutes.
> With this approach, we can ensure that Oak always return when importing a 
> package and we can fail fast for larger packages. Journal distribution will 
> also handle the errors nicely and allow to remove the blocking item from the 
> usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SLING-10529) Remove MDC constructs

2021-06-22 Thread Timothee Maret (Jira)
Timothee Maret created SLING-10529:
--

 Summary: Remove MDC constructs
 Key: SLING-10529
 URL: https://issues.apache.org/jira/browse/SLING-10529
 Project: Sling
  Issue Type: Bug
  Components: Content Distribution
Reporter: Timothee Maret
Assignee: Timothee Maret
 Fix For: Content Distribution Journal Core 0.1.18


Journal distribution serialise content packages in binary less mode only. When 
doing "deep" distribution, the package can still become so large that the 
importing side can't ingest it. When this happens, Apache Oak does a session 
save that never seem to return and can easily take 8 hours to terminate.

I suggest to detect those large packages, based on the package size and simply 
reject packages above a  configurable limit size. The limit should take in 
consideration the mean import throughput for a single Oak session on cloud 
segment tar and should keep the save operation below, say 15 minutes.

With this approach, we can ensure that Oak always return when importing a 
package and we can fail fast for larger packages. Journal distribution will 
also handle the errors nicely and allow to remove the blocking item from the 
usual set of tools. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    5   6   7   8   9   10   11   12   13   14   >