On Tue, Apr 29, 2014 at 3:51 PM, Jim Jagielski <[email protected]> wrote: > On Apr 29, 2014, at 8:41 AM, Jan Kaluža <[email protected]> wrote: >> >> Because later we have to match the URL of request with some proxy_worker. >> >> If you configure ProxyPassMatch like this: >> ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg >> >> Then the proxy_worker name would be "http://x/$1/foo.jpg". >> >> If you receive request with URL "http://x/something/foo.jpg", >> ap_proxy_get_worker() will have to find out the worker with name >> "http://x/$1/foo.jpg". The question here is how it would do that? >> >> The answer used in the patch is "we change the worker name to >> http://x/*/foo.jpg" and check if the URL ("http://x/something/foo.jpg" in >> our case) matches that worker. >> >> If we store the original name with $N, we will have to find out different >> way how to match the worker (probably emulating wildcard pattern matching) >> >> It would be possible to store only the original name (with "$N" variables), >> store the flag that the proxy worker is using regex and change >> ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any >> real advantage here. >> > > In Yann's suggested patch we don't store match_name where it > belongs; so we'd need to put it in shm, which means more > memory.
Agreed, plus this is not balancer-manager aware. BTW, what's the difference between alias_match() used by proxy_trans() and ap_proxy_get_worker()? Longest match? Can an entry matched by proxy_trans() *not* belong to the worker got(ten) later from ap_proxy_get_worker()? If no, another solution would be to backref the worker in (all) its struct proxy_alias(es) entries. That way the worker would be already known at proxy_trans() time (when the entry is matched), and a new ap_proxy_get_worker_for_request(r) could do the association later. AFAICT, we don't use ap_proxy_get_worker() at runtime without a request_rec available. At least that could work for the *Match workers, for which the only relevent requested-URL's match is from proxy_trans(), imo. Still another solution for these workers would be to reuse the ap_regmatch_t vector from proxy_trans() to exact match the worker's name (with its zero or more $N replaced with strings offsets from vector[N], like ap_expr_str_exec_re() does). That would also require a request_rec available at ap_proxy_get_worker()'s (run)time though. > Instead, we store as is and add a simple char flag > which sez if the stored name is a regex. Much savings. > > And I have no idea why storing with $1 -> * somehow makes > things easier or implies a "different way how to match the worker". Do we need to provide a way to escape (application/legitimate) $N in the worker name or simply document on the limitation? In the latter case this is indeed much simpler. > > Finally, let's think about this deeper... > > Assume we do have > > ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg > ProxyPassMatch ^/zippy/(\d+)/bar.jpg http://x/$1/omar/propjoe.gif > > is the intent/desire to have 2 workers or 1? A worker is, in > some ways, simply a nickname for the socket related to a host and port. For which connections can be reused, different parameters apply... > Maybe, in the interests of efficiency and speed, since regexes > are slow as it is, a condition could be specified (a limitation, > as it were), that when using PPM, only everything up to > the 1st potential substitution is considered a unique worker. That could be (another) limitation. But one may want to apply different parameters to these somehow different URLs, since they may be different backends/applications too.
