[
https://issues.apache.org/jira/browse/SLING-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837160#comment-16837160
]
Thomas Mueller commented on SLING-8407:
---------------------------------------
> I prefer catching bugs at test time instead of at runtime
Well, it is hard to get good test coverage. One way to find escaping problems
is using random testing / fuzz testing, that is using randomly generated data.
But letting the code throw an exception would actually simplify testing as well
I think.
> but I think discussing this here doesn't lead us anywhere.
Well, it is kind of the remaining issue...
> I also have a totally different opinion on the whole subject
Well, I would also prefer a simpler solution, I just don't see one that is
robust. If you have an idea please let me know. I do see your point on "wait
until the system is ready", but it is tricky to do: We can't just use the logic
"if indexing is in progress, then it is not ready". There are many reasons why
indexing can be in progress. Also, using a heuristic like "wait 1 minute" won't
work reliably: it's exactly for this reason why we didn't notice the problem
earlier.
I wanted to address a point [~egli] raised: we don't want to change the
findJobs method for this. So I'm trying to add a new API for the health checks
to verify if the replication queue / distribution queue is ready or not.
However, I'm afraid that would result in a lot of changes, and a much more
complex solution. However, seeing that findJobs anyway needs to be changed kind
of makes this point less important in my view: there are bugs to be fixed there
as well. And arguably, and that's the remaining issue I think, we need to
decide whether a runtime exception should be thrown or not in case of query
execution problems (due to syntax error or index not being ready).
> JobManagerImpl.findJobs should prevent traversal
> ------------------------------------------------
>
> Key: SLING-8407
> URL: https://issues.apache.org/jira/browse/SLING-8407
> Project: Sling
> Issue Type: Improvement
> Components: Event
> Reporter: Thomas Mueller
> Priority: Major
>
> The method
> [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373]
> runs a JCR query to find all jobs for a topic.
> It is possible that such a query is running while the repository isn't
> initialized yet, meaning while the index isn't available yet. What is
> happening in this case is that the query is traversing all nodes below that
> path, triggering a warning that the query doesn't use an index. It is
> sometimes happening when a health check is running before the repository is
> initialized (ReplicationQueueHealthCheck and DistributionQueueHealthCheck).
> It doesn't make sense that the query traverses the nodes. It should use an
> index. If the index isn't available yet, it should fail. Therefore, the query
> should use "option(traversal fail)". That would result in an exception that
> can be caught. I will log a related issue to change the health checks to
> process this exception and return HEALTH_CHECK_ERROR for this case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)