Hi, I partly asked the question because I have at times been asked "Why doesn't Sling support regexes like X does", and those questions made me wonder what it would be like to support regexes in certain areas. I am not thinking about going back to a flat list as that didn't scale and even if it was a single massive root map rather than a list, it would not be as efficient as what we have now. I was thinking of allowing regex mapping in portions of the tree to make a single ResourceProvider appear in many places without having to register it in multiple places, but more below, because your reaction is valid.
On 8 February 2013 16:06, Carsten Ziegeler <[email protected]> wrote: > Hi, > > interesting approach. I guess this has two aspects: performance and use cases. > > When it comes to performance, I'm not sure if this helps in our case. > We switched from a list based implementation to this tree based one to > gain performance. If we go to a regexp we're back to a list. Now, if > it would be possible to create clever regexps to make this list way > shorter than the list of all providers, this might give us something. I think this is going to the critical point. In Sling we have ResourceProviders which are generally registered at a single location in the content tree. Even in the ServletResource space a single ResourceProvider doesn't get registered all over the place. So even if a regex could be built to eliminate the depth or width of a tree, it would result in iteration of ResourceProviders, which would be slower. I think this is an argument against attempting tree compaction with regexes. > For providers with a configuration, the regexp would be provided by > the "person" configuring this provider. But we have the servlet > resource provider, adding lots of entries for each servlet and this is > done programmatically. So we would need to come up with some clever > mechanism to create as few regexps for this as possible. It's maybe > doable. Still we would need a prototype to proof that the new > implementation is at least not slower than what we have right now. > > For the use case, I think just because other frameworks have it, is > not sufficient for me. Sling follows a complete different approach > with the resource tree - which framework is using a content tree and > does resource-first request processing? :) Agreed, the ResourceProvider only part of the URL-> handler translation that a typical list of regexes would do, and for a URL space as is typically served by Sling, that list of regexes would be huge. > > And changing this implementation might be really risky; it took me > some time to separate the implementation for the resource resolver > from the jcr and add the new factories and all this stuff. I think we > now have some test cases catching a lot, but i guess we're far from > 100% and there are some edge cases. > ok, I should have a look at that, we currently have 39% coverage in the package itself which is too low even if its covered by integration tests. > So in short :) I'm not against this in general but as long as I don't > hear/see a clear advantage (performance, covering interesting use > cases) I would stay away from this. I am good with that, I think the potential performance gains are going to be marginal and might be better achieved by looking at areas of the current implementation (eg there is split() function that I think I am guilty for that looks like C code, and *might* be better for Hotspot and JMM in some other form.... but nothing without some proof). I also have a feeling that the use cases that typically require regexes are are already covered by the underlying resource tree and Servlet resolution. Thanks for taking the time to think about it. Best Regards Ian > > Carsten > > 2013/2/7 Ian Boston <[email protected]>: >> Hi, >> >> We may have talked about this before, but I can't remember doing so. >> >> The current resource resolution scheme is based on locating a >> ResourceProvider for a given path and asking that ResourceProvider for >> the Resource. We do that by retrieving a list of ProviderHanders (that >> wrap ResourceProviders) from a tree of ResourceProviderEntries. >> >> ie: >> given a root resource provider called rootRPE, the list of >> ProviderHandlers for the path /a/b/c/d is retrieved by performing >> rootRPE.get("a").get("b").get("c").get("d").values() (performed inside [1]) >> where values() is Collection<ProviderHandler> and >> ProviderHandler.getResource(...., path) returns the Resource. >> >> There is some logic surrounding the cases where the list of Providers >> generates no matches and how the search continues, but that might not >> be used. >> >> My question is: >> Would an alternative form of resolution be better? >> Would registering ResourceProviderEntries using a list of regexes as >> many other frameworks do which might make the tree considerably >> smaller and more flexible. I think the last time I did a test regex >> performance once constructed was as fast as this type of nested map >> lookup. >> >> Initially the change might just require changing the implementation of >> [1], and converting the registration from key lookups to patterns. >> >> WDYT? >> Too risky and unnecessary given the request handling already present in >> Sling ? >> Something worth doing ? >> Something better placed to do inside a ResourceProvider ? >> Already present somewhere else in Sling ? >> >> Ian >> >> (BTW, I dont have a specific use case driving this other than, >> re-reading the code base in this area and having used regex based >> registration of "providers/handlers" in other frameworks) >> >> >> 1 >> org.apache.sling.resourceresolver.impl.helper.ResourceIterator.getResourceProviders(String, >> Set<ProviderHandler>) > > > > -- > Carsten Ziegeler > [email protected]
