[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-14 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839451#comment-16839451
 ] 

Thomas Mueller commented on SLING-8407:
---

I think the proposal of [~marett] would work fine for many cases. It wouldn't 
be a complete solution due to possible race conditions, but it would likely be 
a good addition.

For existing installations, I understand the reluctance to change behavior.

However, for new installations, I think "option(traversal fail)" would be best. 
Also, I think it would be good if an invalid query exception would be re-thrown 
(instead of instead of just logged and swallowed like now).

So what about using the config option for throwing an exception as well, and 
not enable the config option by default?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-14 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839144#comment-16839144
 ] 

Thomas Mueller commented on SLING-8407:
---

> Oak could expose an marker API (e.g. IndexReady) 

One issue not mentioned so far is race conditions: Assuming the code (A) checks 
if "index is ready", if yes (B) runs a query. What if the index becomes 
unavailable between A and B? I would rather go for the simpler, and saver 
option to mark the query itself.

>  requiring to pass option(traversal fail) to some queries couples Queries and 
> indexes

Yes, up to some point: it requires that an index can be used for the query, 
meaning there is a guarantee that the query returns within a reasonable time. 
It doesn't say which index exactly. (Actually it doesn't mean that the query 
may not traverse, if traversing is faster than using an index - it just means 
an index needs to be available.) I think it makes sense to set this as the 
default way to run queries. That means, if you run a query and forgot to add an 
index, then the query will fail. It's just that AEM as a whole isn't ready for 
this yet (too many things will break right now). And of course customer code 
will also break (backward compatibility). To enforce the query is run, one 
might use "option(traversal ok)", and my fear is that this would be added to 
too many queries as a short term solution.

Longer term, we could bind each query to a specific index tag using 
"option(index tag ...)", which ensures the right kind of index is used, but 
also not specifying which index exactly. [~rma61...@adobe.com] proposed this 
idea, and I think it would make a lot of sense. Unfortunately, it would also 
require many changes in AEM.



> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-13 Thread Julian Sedding (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838520#comment-16838520
 ] 

Julian Sedding commented on SLING-8407:
---

Imagine the following code in e.g. {{JobManager}} to ensure an index for its 
query is available:

{code:java}
@Reference(target = "(query=SELECT * FROM [nt:base] WHERE prop = ?)")
IndexReady indexReady;
{code}

Of course it's out of the question a service could be registered for each 
conceivable query. However, the OSGi core spec describes [55.3.2 Providing a 
Service on 
Demand|https://osgi.org/specification/osgi.core/7.0.0/framework.servicehooks.html#d0e45721]
 using a [55.6 Listener 
Hook|https://osgi.org/specification/osgi.core/7.0.0/framework.servicehooks.html#d0e45955].

Oak could register a listener hook and test if an index is available to satisfy 
the query from the filter expression. If that's the case, a matching 
{{IndexReady}} instance could be registered to signal this fact.

Such a mechanism could handle both XPath and JCR-SQL2 queries and would make it 
extremely simple to solve the same issue in other places as well. Drawback: it 
would require Jackrabbit/Oak API and would not work with JCR implementations 
that don't have this signalling mechanism.

I don't know what would be required inside Oak to make a meaningful test, but I 
trust [~tmueller] or [~catholicon] could tell without too much research.







> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837424#comment-16837424
 ] 

Vikas Saurabh commented on SLING-8407:
--

[~marett], option traversal fail is a loose hint to pak to fail fast if the 
best available option is to traverse. It's akin to Index hints in usual 
parlance (well oak supports real index hint using option index tag... )

Going towards this new behavior using a configuration is already something that 
[~egli] proposed and [~tmueller] updated his patch accordingly.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Timothee Maret (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837360#comment-16837360
 ] 

Timothee Maret commented on SLING-8407:
---

{quote}Queries and indexes are de-coupled (not just in oak ... in a lot of DBs 
too)
{quote}
Good point. It seems to me that requiring to pass {{option(traversal fail)}} to 
some queries couples Queries and indexes. Maybe the marker service option may 
only work in its coarse implementation. Out of curiosity, is there a reason why 
this new behaviour, failing upon traversal, is not enabled per deployment as a 
big switch ?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837320#comment-16837320
 ] 

Vikas Saurabh commented on SLING-8407:
--

[~marett],
bq.   to wait on a specific marker service and thus a specific index.
actually that won't work as well. Queries and indexes are de-coupled (not just 
in oak ... in a lot of DBs too). So, making a client know "which" index it 
should use for a query is troublesome in long run (I think). The intention is 
probably that the query should be backed by an index which is what {{option 
(traversal fail)}} indicates. But I don't know much osgi to say "try running 
this query with option traversal fail whenever a new index gets ready... once 
this is done then go on to call activate".

Also, that said, I don't completely agree to the idea of not allowing job 
manager impl at all just because find jobs functionality needs something which 
isn't ready. As [~tmueller] mentioned earlier - first setup is one case but 
imagine a case where the index which "should" answer query via {{findJobs}} is 
corrupt for some reason. Should that result in inability to addJobs as well?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Timothee Maret (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837316#comment-16837316
 ] 

Timothee Maret commented on SLING-8407:
---

[~catholicon] I agree one marker may be too coarse. What Oak may do is to 
subscribe more fine grained marker services, maybe one by index. Oak would set 
a property (e.g. {{name}})  in the marker service that would allow consumers to 
wait on a specific marker service and thus a specific index.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837309#comment-16837309
 ] 

Vikas Saurabh commented on SLING-8407:
--

bq. register a service implementing this API only when the index is ready
"*the* index" - which index would that be. I guess you meant all indexes. To me 
that sounds a bit of an overkill.

bq. The JobManagerImpl would be active only when the index is ready
afaiu, it's the {{findJobs}} method that requires a query. Doing it this way 
means the system won't allow adding jobs as well. I wonder if that's really 
reasonable (to me add jobs should keep on working... status consoles, 
dashboards, healthchecks unavailability is probably more tolerable that 
inability to even add a job.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837267#comment-16837267
 ] 

Carsten Ziegeler commented on SLING-8407:
-

[~marett] I think that's a way better option, +1

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837262#comment-16837262
 ] 

Thomas Mueller commented on SLING-8407:
---

Pull request: https://github.com/apache/sling-org-apache-sling-event/pull/5

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Jason E Bailey (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837247#comment-16837247
 ] 

Jason E Bailey commented on SLING-8407:
---

[~tmueller] Allow me to provide a thought exercise.
Which is faster? To run a query to find the children of a specific node, or to 
call node.getChildren()?  From all of my experience and testing it's the 
getChildren() function. 

There is no doubt that if you don't know where the data is, that running a 
properly tuned indexed search is the fast and efficient way to go. However if 
you all the nodes are in one place or in one specific tree structure. Then 
you're not really gaining anything. For jobs, at least the implementation I 
have, everything is under /var/eventing/jobs.

> I assume with "oak traversal" you mean "tree traversal within the query 
> engine", and with "tree traversal" you mean traversing a tree using the JCR 
> API (getNodes,...).

I'll admit I haven't had the time to dive into oak as much as I'd like. I've 
seen a lot of strange things come out of oak as a result. I've had my whole 
system go down once because it was returning a result set, but every time I 
asked for the next item in the result set it was actually performing the query 
again. So I'll take your word on this.

In a real world scenario, I use 
[https://github.com/apache/sling-org-apache-sling-resource-filter] ,which I 
wrote, in multiple production environments and the teams I've worked with have 
found that using this has had no negative impact on performance.

Which brings me back to this bundle, if there is an issue where the index may 
not be available and you can do it efficiently without an index. Don't use an 
index.

 

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Timothee Maret (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837252#comment-16837252
 ] 

Timothee Maret commented on SLING-8407:
---

There may be a way between blocking everything and changing the expectations by 
throwing exceptions. Oak could expose an marker API (e.g. IndexReady) and 
register a service implementing this API only when the index is ready. 
JobManagerImpl would add a static reference for an IndexReady service. The 
JobManagerImpl would be active only when the index is ready and consequently 
the services depending on the JobManager. Services that "only" require the 
repository could still be activated meanwhile. Oak may subscribe named markers 
corresponding to each configured index assuming this level of granularity is 
required.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837230#comment-16837230
 ] 

Thomas Mueller commented on SLING-8407:
---

About adding a new API: I tried changing 
org/apache/sling/distribution/queue/spi/DistributionQueue.java in 
sling-org-apache-sling-distribution-core, add a method "boolean 
isReadyForQueries()". That would require a major version increase for 
org.apache.sling.distribution.queue.spi, would that be OK? We could use an 
existing API, for example DistributionQueue.getStatus() which returns a 
DistributionQueueStatus which returns a DistributionQueueState. So we could set 
the state to paused if the index isn't available. That would require running a 
query, so would also change behavior there. The health check could then check 
if it's paused, to avoid calling JobManager getJobs internally. A similar 
change for the replication queue would be needed, where also a major version 
increase is needed, or the existing method isPaused can be used (which is also 
a small behavior change, but less so than for DistributionQueue).

In my view the proposed solution is quite a bit simpler.

As for the bug [~egli] found, I will create a new issue.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837160#comment-16837160
 ] 

Thomas Mueller commented on SLING-8407:
---

> I prefer catching bugs at test time instead of at runtime

Well, it is hard to get good test coverage. One way to find escaping problems 
is using random testing / fuzz testing, that is using randomly generated data. 
But letting the code throw an exception would actually simplify testing as well 
I think.

> but I think discussing this here doesn't lead us anywhere.

Well, it is kind of the remaining issue...

> I also have a totally different opinion on the whole subject

Well, I would also prefer a simpler solution, I just don't see one that is 
robust. If you have an idea please let me know. I do see your point on "wait 
until the system is ready", but it is tricky to do: We can't just use the logic 
"if indexing is in progress, then it is not ready". There are many reasons why 
indexing can be in progress. Also, using a heuristic like "wait 1 minute" won't 
work reliably: it's exactly for this reason why we didn't notice the problem 
earlier.

I wanted to address a point [~egli] raised: we don't want to change the 
findJobs method for this. So I'm trying to add a new API for the health checks 
to verify if the replication queue / distribution queue is ready or not. 
However, I'm afraid that would result in a lot of changes, and a much more 
complex solution. However, seeing that findJobs anyway needs to be changed kind 
of makes this point less important in my view: there are bugs to be fixed there 
as well. And arguably, and that's the remaining issue I think, we need to 
decide whether a runtime exception should be thrown or not in case of query 
execution problems (due to syntax error or index not being ready).

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836977#comment-16836977
 ] 

Carsten Ziegeler commented on SLING-8407:
-

[~tmueller] I think you're misinterpreting my statements; I prefer catching 
bugs at test time instead of at runtime through runtime exceptions; but I think 
discussing this here doesn't lead us anywhere.
I also have a totally different opinion on the whole subject and the suggested 
solution is spreading code pieces across a lot of modules; it's also based on 
looking at a single usage pattern.
Anyways, I've said what I wanted to say, you still want to continue your way. 
That's fine, I'm not blocking things here

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836974#comment-16836974
 ] 

Thomas Mueller commented on SLING-8407:
---

There is at least one other place where the parameter is not encoded: topic.
If I'm not mistaken, ISO9075.encode is used unnecessarily in many places to 
encode constant property names, while some property names are not encoded, e.g.

{noformat}
 buf.append(propName);
{noformat}

getJobById doesn't escape the id and also swallows exceptions.


> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836946#comment-16836946
 ] 

Thomas Mueller commented on SLING-8407:
---

[~jebailey] I'm afraid I'm not familiar with the specific requirements of the 
job manager. But looking at the code, it seems it makes a lot of sense to run a 
query. There are potentially many filter conditions, and running a query allows 
to use an index. Assuming the index definition is correct, using an index 
reduces the reads from O\(n\) (where n is the number of jobs in the repository) 
to roughly O\(1\), assuming the consumer only iterates over a constant number 
of entries of the result (which I hope is the case). So in fact you do 
sacrifice a lot of speed and efficiency if you traverse and filter yourself.

> which utilizes a tree traversal, which is not the same as an oak traversal

I assume with "oak traversal" you mean "tree traversal within the query 
engine", and with "tree traversal" you mean traversing a tree using the JCR API 
(getNodes,...). Speed-wise, there is no difference. It's just that running a 
query logs a warning in case you traverse a lot, while using the JCR API does 
not. (Unfortunately we can't easily implement a warning in this case - the JCR 
API is very fine-grained, making this hard.)

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-10 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836935#comment-16836935
 ] 

Thomas Mueller commented on SLING-8407:
---

> therefore there is no point in throwing an exception if the query is not 
> correct back to he caller. 

This is a very surprising view. Is that approach used a lot in the code you 
wrote? 

So you are saying, if the implementation is buggy, then it's fine to swallow 
the exception, and return an empty result (in this case)? This is very 
problematic: it hides problem. People don't typically read the log file, so 
logging a warning in case of coding errors results in nobody noticing the bug 
for a long, long time (as we have seen here).

The right approach, in my view, is to throw a runtime exception. This ensures 
coding errors are detected early on, not hidden, and don't simply result in 
weird, buggy behavior.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Jason E Bailey (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836568#comment-16836568
 ] 

Jason E Bailey commented on SLING-8407:
---

If I remember correctly, jobs are stored in a specific location. I don't think 
this is a case where a query is even needed. We have utilities to filter and 
select child nodes from a path which utilizes a tree traversal, which is not 
the same as an oak traversal. Implementing a filter like that would prevent 
this issue and also make it independent of the type of repository that 
underlies the data store. Without sacrificing speed or efficiency.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836445#comment-16836445
 ] 

Carsten Ziegeler commented on SLING-8407:
-

The query is not provided by the caller of the API, it is constructed in the 
impl, therefore there is no point in throwing an exception if the query is not 
correct back to he caller. It's not the fault of the caller. Therefore it is 
ignored. A comment in the code sounds like a good idea

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836442#comment-16836442
 ] 

Thomas Mueller commented on SLING-8407:
---

 } catch (final QuerySyntaxException qse) {
String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
if (configuration.isFailTraversal() {
// no index is available, or (unlikely), query syntax
logger.debug(message, qse);
logger.info(message);
} else {
// query syntax
logger.warn(message, qse);
}
// re-throw in any case
throw new IllegalStateException(message);
 } finally {

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836441#comment-16836441
 ] 

Thomas Mueller commented on SLING-8407:
---

[~egli] yes, or we could use a different configuration option for this case? I 
think throwing an exception (versus just logging) is much better if the query 
syntax is wrong. Actually I think it was an oversight to swallow the exception 
(meaning, not re-throw). In case it was intentional, I would expect there is a 
comment in the source code (similar to "fall through" comments in switch / 
case),... [~cziegeler] what do you think? I assume the code is very old so you 
will probably not remember this exact case, but given that you likely wrote the 
code, was that also your way of thinking?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Stefan Egli (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836416#comment-16836416
 ] 

Stefan Egli commented on SLING-8407:


> {{if (configuration.isFailTraversal()) {}}

yes, that's how I envisioned it.

However, I'd argue that throwing the RuntimeException could also depend on that 
same condition, so something like
{noformat}
 } catch (final QuerySyntaxException qse) {
if (configuration.isFailTraversal() {
// no index is available
String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
logger.debug(message, qse);
logger.info(message);
throw new IllegalStateException();
} else {
logger.warn("Query syntax wrong " + buf.toString(), qse);
}
 } finally {
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836410#comment-16836410
 ] 

Thomas Mueller commented on SLING-8407:
---

Regarding the configuration option would this patch work? The option is enabled 
by default (given that the same option is used in SLING-7544, and there are no 
tests for older versions of Oak, I think the point of supporting older Oak 
version is moot.)

{noformat}
--- 
a/src/main/java/org/apache/sling/event/impl/jobs/config/JobManagerConfiguration.java
+++ 
b/src/main/java/org/apache/sling/event/impl/jobs/config/JobManagerConfiguration.java
@@ -117,6 +117,9 @@ public class JobManagerConfiguration {
 /** Configuration property for the scheduled jobs path. */
 public static final String PROPERTY_SCHEDULED_JOBS_PATH = 
"job.scheduled.jobs.path";
 
+/** Configuration property to fail traversal queries. */
+public static final String PROPERTY_FAIL_TRAVERSAL = "job.failTraversal";
+
 /** The jobs base path with a slash. */
 private String jobsBasePathWithSlash;
 
@@ -178,6 +181,8 @@ public class JobManagerConfiguration {
 /** The topology capabilities. */
 private volatile TopologyCapabilities topologyCapabilities;
 
+private boolean failTraversal;
+
 /**
  * Activate this component.
  * @param props Configuration properties
@@ -207,6 +212,8 @@ public class JobManagerConfiguration {
 DEFAULT_SCHEDULED_JOBS_PATH);
 this.scheduledJobsPathWithSlash = this.scheduledJobsPath + "/";
 
+this.failTraversal = 
PropertiesUtil.toBoolean(props.get(PROPERTY_FAIL_TRAVERSAL), true);
+
 this.historyCleanUpRemovedJobs = config.cleanup_period();
 
 // create initial resources
@@ -460,6 +467,15 @@ public class JobManagerConfiguration {
 return (slash ? this.scheduledJobsPathWithSlash : 
this.scheduledJobsPath);
 }
 
+/**
+ * Whether to fail traversal when querying for jobs.
+ *
+ * @return true if enabled
+ */
+public boolean isFailTraversal() {
+return failTraversal;
+}
+
 /**
  * Stop processing
  */

--- a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
+++ b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
@@ -520,6 +520,9 @@ public class JobManagerImpl
 buf.append(Job.PROPERTY_JOB_CREATED);
 buf.append(" ascending");
 }
+if (configuration.isFailTraversal()) {
+buf.append(" option(traversal fail)");
+}
 final Iterator iter = 
resolver.findResources(buf.toString(), "xpath");
 long count = 0;
 
@@ -535,7 +538,11 @@ public class JobManagerImpl
 }
  }
 } catch (final QuerySyntaxException qse) {
-logger.warn("Query syntax wrong " + buf.toString(), qse);
+// no index is available
+String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
+logger.debug(message, qse);
+logger.info(message);
+throw new IllegalStateException();
 } finally {
 resolver.close();
 }
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836389#comment-16836389
 ] 

Thomas Mueller commented on SLING-8407:
---

[~egli] how should the feature flag be implemented, assuming we do want to only 
use "buf.append(" option(traversal fail)");" if enabled? Would it be similar to 
how it is implemented in SLING-7544?:

{noformat}
if (this.factory.isForceNoAliasTraversal()) { queryString += " 
option(traversal fail);  }
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836383#comment-16836383
 ] 

Thomas Mueller commented on SLING-8407:
---

> why not delay the health checks until the repo is initialized?

[~cziegeler] I don't know of a way :-)

Even if we do delay (assuming you know of a way), then there is still the 
problem of corrupt / unavailable / deleted / disabled indexes, which right now 
are not properly handled. With the patch, they are.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836379#comment-16836379
 ] 

Carsten Ziegeler commented on SLING-8407:
-

so we are talking about repository initialization - why not delay the health 
checks until the repo is initialized?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836367#comment-16836367
 ] 

Thomas Mueller commented on SLING-8407:
---

> How do you know that no one else is calling that api?

Do you mean JobManager.findJobs? It is called by health checks (e.g. 
SLING-8408). Health checks are run early on, even during repository 
initialization. I'm not aware of other code that is calling this method during 
repository initialization. I would argue if there is such code, then that 
should also fail, until the repository is initialized.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836363#comment-16836363
 ] 

Thomas Mueller commented on SLING-8407:
---

[~egli] looks like you found a bug in the JobManagerImpl, here:

{noformat}
buf.append(" '");
buf.append(current.getValue());
buf.append("'");
{noformat}

The second line is not escaping the value. Single quotes in the value need to 
be repeated, or a bind variable needs to be used.

Right now, the problem is (almost) silently ignored here:

{noformat}
} catch (final QuerySyntaxException qse) {
logger.warn("Query syntax wrong " + buf.toString(), qse);
} finally {
{noformat}

My change would throw an IllegalStateException for this case, which is arguably 
much better than ignoring the problem. The proper solution is of course to fix 
the code above.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836361#comment-16836361
 ] 

Carsten Ziegeler commented on SLING-8407:
-

How do you know that no one else is calling that api?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836358#comment-16836358
 ] 

Thomas Mueller commented on SLING-8407:
---

> It seems Oak is not ready without the index as queries can't be executed.

Let's put it that way: queries _should_ not be executed if no index is 
available. Right now, they are executed, it's just that in this case a warning 
is logged by Oak. There are multiple reasons an index is not available:

* (A) during the very first startup, the index isn't ready yet
* (B) if the index is corrupt or is not available (for remote indexes)
* (C) if the index was removed or disabled (usually due to user error)

> So we should probably block everything until the index is there.

Block everything, in my view, is the wrong solution. That would mean no part of 
the system would work (or could start up) just because one index is not 
available / available yet.

The solution I propose is to fail exactly that part of the system that needs 
the index. In this case, it affects the health check.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836349#comment-16836349
 ] 

Carsten Ziegeler commented on SLING-8407:
-

I don't agree. It seems Oak is not ready without the index as queries can't be 
executed. So we should probably block everything until the index is there. If 
we fix it here, we have to fix every potential upstream user of this API.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836354#comment-16836354
 ] 

Thomas Mueller commented on SLING-8407:
---

> adding option(traversal fail) by default but only if a new config option is 
> set (false by default)? 

[~egli] yes that would work. 

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Stefan Egli (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836348#comment-16836348
 ] 

Stefan Egli commented on SLING-8407:


I think we should not throw the RuntimeException due to backwards 
compatibility. We already have many cases where WARNs are logged which would 
now result in a RuntimeException (something to look at separately perhaps).

Here's an example of the exception:
{noformat}
09.05.2019 04:47:34.462 *WARN* | xxx | | [10.43.32.53 [1557370054384] POST 
/libs/wcm/core/content/reference.json HTTP/1.1] 
org.apache.sling.event.impl.jobs.JobManagerImpl : Query syntax wrong 
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@event.job.topic = 'xxx' 
and not(@slingevent:finishedState) and ((@cq:path = '/a/b/c/d'f/my.jpg'))] 
order by @slingevent:created ascending 
org.apache.sling.api.resource.QuerySyntaxException: java.text.ParseException: 
Query: /jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@event.job.topic 
= 'xxx' and not(@slingevent:finishedState) and ((@cq:path = 
'/a/b/c/d'f/my.jpg(*)'))] order by @slingevent:created ascending
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836333#comment-16836333
 ] 

Thomas Mueller commented on SLING-8407:
---

[~cziegeler] longer term, we want to enable "traversal fail" in general (for 
all queries) in Oak. That is, require queries to have an index.

However, the current behavior of the Sling JobManager is problematic: it 
catches this exception (which is converted to a QuerySyntaxException, even 
thought the query syntax is just fine), and basically ignores it (only logs it 
as a warning). This behavior is problematic in any case. Instead, the 
JobManager should throw such exceptions.

My patch fixes that.

> makes it impossible to use this implementation in older versions.

Right now, Sling runs tests against Oak 1.5.7. If you want to support older 
versions of Oak, then there should be tests for older versions as well.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Stefan Egli (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836332#comment-16836332
 ] 

Stefan Egli commented on SLING-8407:


Could we keep the backwards compatibility by not adding {{option(traversal 
fail)}} by default but only if a new config option is set (false by default)? 
We can then enable this in deployments where we think we need this.
/cc [~tmueller], [~cziegeler]

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836322#comment-16836322
 ] 

Thomas Mueller commented on SLING-8407:
---

[~cziegeler]

> I'm not sure if we really should go this way. It adds a dependency to some 
> Oak version which makes it impossible to use this implementation in older 
> versions.

Yes, the "option(traversal fail)" is not yet supported in older Oak version 
(1.0, 1.2, 1.4). I'm fine to add support for this there: this syntax would be 
supported, but ignored (which is fine).

> But more important it sounds like a workaround to me, It seems because the 
> health checks start too early, 

Well, there is no other API I can think of to verify at what point exactly 
indexing is done. So there is no way to ensure health checks don't start too 
early. If you have a better idea, then please tell me.

> you want to add a workaround here and there. Why not fix the root problem?

In my view, the root cause of the problem is in fact that the queries, that 
require an index, don't currently ensure that the index is available. So in 
fact we are fixing the root cause here. See also the related Oak issue 
[OAK-5889|https://issues.apache.org/jira/browse/OAK-5889].

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836304#comment-16836304
 ] 

Thomas Mueller commented on SLING-8407:
---

[~egli] could you review please, or let me know who should? (I'm not a Sling 
committer.)

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836296#comment-16836296
 ] 

Thomas Mueller commented on SLING-8407:
---

Patch for sling-org-apache-sling-event:

{noformat}
diff --git a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java 
b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
index 279d5ca..7faf515 100644
--- a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
+++ b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
@@ -520,6 +520,7 @@ public class JobManagerImpl
 buf.append(Job.PROPERTY_JOB_CREATED);
 buf.append(" ascending");
 }
+buf.append(" option(traversal fail)");
 final Iterator iter = 
resolver.findResources(buf.toString(), "xpath");
 long count = 0;
 
@@ -535,7 +536,11 @@ public class JobManagerImpl
 }
  }
 } catch (final QuerySyntaxException qse) {
-logger.warn("Query syntax wrong " + buf.toString(), qse);
+// no index is available
+String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
+logger.debug(message, qse);
+logger.info(message);
+throw new IllegalStateException();
 } finally {
 resolver.close();
 }
diff --git 
a/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java 
b/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
index c51ecd9..42af6e8 100644
--- a/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
+++ b/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
@@ -98,8 +98,8 @@ public abstract class AbstractJobHandlingTest {
 
 String localRepo = System.getProperty("maven.repo.local", "");
 
-final String jackrabbitVersion = "2.13.1";
-final String oakVersion = "1.5.7";
+final String jackrabbitVersion = "2.14.6";
+final String oakVersion = "1.6.0";
 
 final String slingHome = new File(buildDir + File.separatorChar + 
"sling_" + System.currentTimeMillis()).getAbsolutePath();
 
@@ -230,6 +230,7 @@ public abstract class AbstractJobHandlingTest {
 
 mavenBundle("com.google.guava", "guava", "15.0"),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-api", 
jackrabbitVersion),
+mavenBundle("org.apache.jackrabbit", "jackrabbit-data", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-jcr-commons", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-spi", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-spi-commons", 
jackrabbitVersion),
{noformat}

* Added "option(traversal fail)".
* If there is no index, then QuerySyntaxException is thrown (Sling converts all 
kinds of query exceptions to QuerySyntaxException, even if the syntax is fine - 
this is slightly confusing, but fixing that would be a backward compatibility 
problem).
* This exception is logged at info level, and then converted to 
IllegalStateException (a RuntimeException). So if there is no index, the 
error.log will show what's going on - just that this is no longer a warning, 
just an information.
* The AbstractJobHandlingTest now uses a stable Oak version instead of an 
unstable one as before. This is needed because "option(traversal fail)" was 
implemented slightly after Oak 1.5.7.
* The Jackrabbit version is updated as well, as that's the version needed by 
the newer Oak version
* Adding "jackrabbit-data" is needed as that was added slightly after Oak 1.5.7 
(part of modularization).

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't 

[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836160#comment-16836160
 ] 

Thomas Mueller commented on SLING-8407:
---

The Oak version change (and subsequent Jackrabbit version change, plus addition 
of jackrabbit-data) will update Oak to a stable version.

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836157#comment-16836157
 ] 

Thomas Mueller commented on SLING-8407:
---

Here a first patch that passes the tests. I will try to throw a special kind of 
RuntimeException instead of returning an empty list / null:

{noformat}
--- a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
+++ b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
@@ -520,6 +520,7 @@ public class JobManagerImpl
 buf.append(Job.PROPERTY_JOB_CREATED);
 buf.append(" ascending");
 }
+buf.append(" option(traversal fail)");
 final Iterator iter = 
resolver.findResources(buf.toString(), "xpath");
 long count = 0;
 
@@ -535,7 +536,10 @@ public class JobManagerImpl
 }
  }
 } catch (final QuerySyntaxException qse) {
-logger.warn("Query syntax wrong " + buf.toString(), qse);
+// no index is available
+String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
+logger.debug(message, qse);
+logger.info(message, qse);
 } finally {
 resolver.close();
 }
diff --git 
a/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java 
b/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
index c51ecd9..42af6e8 100644
--- a/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
+++ b/src/test/java/org/apache/sling/event/it/AbstractJobHandlingTest.java
@@ -98,8 +98,8 @@ public abstract class AbstractJobHandlingTest {
 
 String localRepo = System.getProperty("maven.repo.local", "");
 
-final String jackrabbitVersion = "2.13.1";
-final String oakVersion = "1.5.7";
+final String jackrabbitVersion = "2.14.6";
+final String oakVersion = "1.6.0";
 
 final String slingHome = new File(buildDir + File.separatorChar + 
"sling_" + System.currentTimeMillis()).getAbsolutePath();
 
@@ -230,6 +230,7 @@ public abstract class AbstractJobHandlingTest {
 
 mavenBundle("com.google.guava", "guava", "15.0"),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-api", 
jackrabbitVersion),
+mavenBundle("org.apache.jackrabbit", "jackrabbit-data", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-jcr-commons", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-spi", 
jackrabbitVersion),
 mavenBundle("org.apache.jackrabbit", "jackrabbit-spi-commons", 
jackrabbitVersion),
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-08 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835649#comment-16835649
 ] 

Thomas Mueller commented on SLING-8407:
---

Looks like an ancient Oak version is used.

Full exception stack trace:
{noformat}
Caused by: java.text.ParseException: Query:
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@event.job.topic = 
'sling/test/history' and @slingevent:finishedState] order by 
@slingevent:finishedDate descending option(*)(traversal fail); expected: 
at 
org.apache.jackrabbit.oak.query.xpath.XPathToSQL2Converter.getSyntaxError(XPathToSQL2Converter.java:1048)
at 
org.apache.jackrabbit.oak.query.xpath.XPathToSQL2Converter.convertToStatement(XPathToSQL2Converter.java:328)
at 
org.apache.jackrabbit.oak.query.xpath.XPathToSQL2Converter.convert(XPathToSQL2Converter.java:84)
at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.parseQuery(QueryEngineImpl.java:179)
at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:253)
at 
org.apache.jackrabbit.oak.jcr.query.QueryManagerImpl.executeQuery(QueryManagerImpl.java:136)
... 23 common frames omitted
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-08 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835650#comment-16835650
 ] 

Thomas Mueller commented on SLING-8407:
---

Looks like this is 
{noformat}
[INFO] Running org.apache.sling.event.it.HistoryTest
[main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - creating PaxExam runner 
for class org.apache.sling.event.it.HistoryTest
[main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - running test class 
org.apache.sling.event.it.HistoryTest
{noformat}


> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-08 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835633#comment-16835633
 ] 

Thomas Mueller commented on SLING-8407:
---

The following change in Sling didn't work as expected.

{noformat}
sling-org-apache-sling-event
git diff src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
diff --git a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java 
b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
index 279d5ca..0663aaa 100644
--- a/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
+++ b/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java
@@ -520,6 +520,7 @@ public class JobManagerImpl
 buf.append(Job.PROPERTY_JOB_CREATED);
 buf.append(" ascending");
 }
+buf.append(" option(traversal fail)");
 final Iterator iter = 
resolver.findResources(buf.toString(), "xpath");
 long count = 0;
 
@@ -535,7 +536,10 @@ public class JobManagerImpl
 }
  }
 } catch (final QuerySyntaxException qse) {
-logger.warn("Query syntax wrong " + buf.toString(), qse);
+// no index is available
+String message = "Query " + buf.toString() + " failed: " + 
qse.getMessage();
+logger.debug(message, qse);
+logger.info(message);
 } finally {
 resolver.close();
 }
{noformat}

It results in a test failure, as follows. Actually not a test failure, but 
looks like an endless loop in a test:
{noformat}
08.05.2019 16:13:39.309 *INFO* [Time-limited test] 
org.apache.sling.event.impl.jobs.JobManagerImpl Query 
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@event.job.topic = 
'sling/test/history' and @slingevent:finishedState] order by 
@slingevent:finishedDate descending option(traversal fail) failed: 
java.text.ParseException: Query:
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@event.job.topic = 
'sling/test/history' and @slingevent:finishedState] order by 
@slingevent:finishedDate descending option(*)(traversal fail); expected: 
{noformat}

I couldn't find out which test this is, and which version of Oak is used: mvn 
dependency:tree doesn't say. And
{noformat}
mvn test dependency:tree -DskipTests=true | grep "jackrabbit"
says:
[INFO] skip non existing resourceDirectory 
/Users/mueller/jackrabbit/sling-org-apache-sling-event/src/test/resources
[INFO] +- org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.11.2:provided
[INFO] |  |  \- 
org.apache.jackrabbit.vault:org.apache.jackrabbit.vault:jar:3.1.18:test
[INFO] |  +- org.apache.jackrabbit:jackrabbit-api:jar:2.11.3:test
{noformat}

Looks like some kind of JCR mock implementation is used?
{noformat}
org.apache.sling:org.apache.sling.testing.jcr-mock:jar
{noformat}

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-08 Thread Carsten Ziegeler (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835626#comment-16835626
 ] 

Carsten Ziegeler commented on SLING-8407:
-

I'm not sure if we really should go this way. It adds a dependency to some Oak 
version which makes it impossible to use this implementation in older versions.
But more important it sounds like a workaround to me, It seems because the 
health checks start too early, you want to add a workaround here and there. Why 
not fix the root problem?

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-8407) JobManagerImpl.findJobs should prevent traversal

2019-05-08 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835620#comment-16835620
 ] 

Thomas Mueller commented on SLING-8407:
---

"option(traversal fail)" requires Oak 1.5.13 (see 
[OAK-4888|https://issues.apache.org/jira/browse/OAK-4888]).

> JobManagerImpl.findJobs should prevent traversal
> 
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
>  Issue Type: Improvement
>  Components: Event
>Reporter: Thomas Mueller
>Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)