[ https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837160#comment-16837160 ]
Thomas Mueller commented on SLING-8407: --------------------------------------- > I prefer catching bugs at test time instead of at runtime Well, it is hard to get good test coverage. One way to find escaping problems is using random testing / fuzz testing, that is using randomly generated data. But letting the code throw an exception would actually simplify testing as well I think. > but I think discussing this here doesn't lead us anywhere. Well, it is kind of the remaining issue... > I also have a totally different opinion on the whole subject Well, I would also prefer a simpler solution, I just don't see one that is robust. If you have an idea please let me know. I do see your point on "wait until the system is ready", but it is tricky to do: We can't just use the logic "if indexing is in progress, then it is not ready". There are many reasons why indexing can be in progress. Also, using a heuristic like "wait 1 minute" won't work reliably: it's exactly for this reason why we didn't notice the problem earlier. I wanted to address a point [~egli] raised: we don't want to change the findJobs method for this. So I'm trying to add a new API for the health checks to verify if the replication queue / distribution queue is ready or not. However, I'm afraid that would result in a lot of changes, and a much more complex solution. However, seeing that findJobs anyway needs to be changed kind of makes this point less important in my view: there are bugs to be fixed there as well. And arguably, and that's the remaining issue I think, we need to decide whether a runtime exception should be thrown or not in case of query execution problems (due to syntax error or index not being ready). > JobManagerImpl.findJobs should prevent traversal > ------------------------------------------------ > > Key: SLING-8407 > URL: https://issues.apache.org/jira/browse/SLING-8407 > Project: Sling > Issue Type: Improvement > Components: Event > Reporter: Thomas Mueller > Priority: Major > > The method > [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373] > runs a JCR query to find all jobs for a topic. > It is possible that such a query is running while the repository isn't > initialized yet, meaning while the index isn't available yet. What is > happening in this case is that the query is traversing all nodes below that > path, triggering a warning that the query doesn't use an index. It is > sometimes happening when a health check is running before the repository is > initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck). > It doesn't make sense that the query traverses the nodes. It should use an > index. If the index isn't available yet, it should fail. Therefore, the query > should use "option(traversal fail)". That would result in an exception that > can be caught. I will log a related issue to change the health checks to > process this exception and return HEALTH_CHECK_ERROR for this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)