[ 
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837160#comment-16837160
 ] 

Thomas Mueller commented on SLING-8407:
---------------------------------------

> I prefer catching bugs at test time instead of at runtime

Well, it is hard to get good test coverage. One way to find escaping problems 
is using random testing / fuzz testing, that is using randomly generated data. 
But letting the code throw an exception would actually simplify testing as well 
I think.

> but I think discussing this here doesn't lead us anywhere.

Well, it is kind of the remaining issue...

> I also have a totally different opinion on the whole subject

Well, I would also prefer a simpler solution, I just don't see one that is 
robust. If you have an idea please let me know. I do see your point on "wait 
until the system is ready", but it is tricky to do: We can't just use the logic 
"if indexing is in progress, then it is not ready". There are many reasons why 
indexing can be in progress. Also, using a heuristic like "wait 1 minute" won't 
work reliably: it's exactly for this reason why we didn't notice the problem 
earlier.

I wanted to address a point [~egli] raised: we don't want to change the 
findJobs method for this. So I'm trying to add a new API for the health checks 
to verify if the replication queue / distribution queue is ready or not. 
However, I'm afraid that would result in a lot of changes, and a much more 
complex solution. However, seeing that findJobs anyway needs to be changed kind 
of makes this point less important in my view: there are bugs to be fixed there 
as well. And arguably, and that's the remaining issue I think, we need to 
decide whether a runtime exception should be thrown or not in case of query 
execution problems (due to syntax error or index not being ready).

> JobManagerImpl.findJobs should prevent traversal
> ------------------------------------------------
>
>                 Key: SLING-8407
>                 URL: https://issues.apache.org/jira/browse/SLING-8407
>             Project: Sling
>          Issue Type: Improvement
>          Components: Event
>            Reporter: Thomas Mueller
>            Priority: Major
>
> The method 
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
>  runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't 
> initialized yet, meaning while the index isn't available yet. What is 
> happening in this case is that the query is traversing all nodes below that 
> path, triggering a warning that the query doesn't use an index. It is 
> sometimes happening when a health check is running before the repository is 
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an 
> index. If the index isn't available yet, it should fail. Therefore, the query 
> should use "option(traversal fail)". That would result in an exception that 
> can be caught.  I will log a related issue to change the health checks to 
> process this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to