Wow, from a brief look at the comments, and some of the code, it looks exactly like what we need. Let's us define whatever interface we want between the cache and the monitor for stats purposes, and write our code to extract the stats, Per DS.
However, in order fully support our case, to the code that forces every DS to have a unique host regexp also needs to be changed: See https://github.com/apache/incubator-trafficcontrol/blob/ 3e3a5f41d482a065f46ef287b347e66d7d205d82/traffic_monitor/ todata/todata.go#L234 . Thanks Robert ! On Tue, Jan 16, 2018 at 6:03 PM, Robert Butts <[email protected]> wrote: > https://github.com/apache/incubator-trafficcontrol/pull/1768 > https://github.com/apache/incubator-trafficcontrol/pull/1769 > > @oren Do those solve your use case? > > > On Sun, Jan 7, 2018 at 1:36 AM, Oren Shemesh <[email protected]> wrote: > > > Thank you Robert :-) > > > > On Thu, Jan 4, 2018 at 5:49 PM, Robert Butts <[email protected]> > > wrote: > > > > > Eh, you're right, a Profile is the better place, since groups of > servers > > > will likely be configured with the same type. Much as it pains me to > put > > > more things in Parameters. > > > > > > And it can be reasonably safe, if we assume a missing Parameter implies > > the > > > existing Cache Type. (As safe as any Profile/Parameter can be; it goes > > > without saying that typos will break things at runtime, the best we can > > do > > > is log errors in the Monitor if an unknown Stats Type Parameter value > is > > > found) > > > > > > I'd suggest naming the Parameter starting with `health.` (e.g. > > > `health.polling.type`), which should make the existing Traffic Ops code > > > automatically put it in `monitoring.json`. So then Traffic Ops code > > doesn't > > > need changed, just the Monitor. > > > > > > > > > On Thu, Jan 4, 2018 at 6:01 AM, Oren Shemesh <[email protected]> wrote: > > > > > > > Robert, I see your valid point re the ability to automatically infer > > the > > > > stats "type". > > > > So yes, given that the host in the stat is the origin, a single word > > is a > > > > valid domain, hence it is not possible to auto-detect it. > > > > > > > > I believe that adding a parameter to the server profile is the way to > > do > > > > this. > > > > > > > > Thanks ! > > > > > > > > Oren. > > > > > > > > On Tue, Jan 2, 2018 at 6:42 PM, Robert Butts < > [email protected] > > > > > > > wrote: > > > > > > > > > Erm, @neuman beat me to it. > > > > > > > > > > On Tue, Jan 2, 2018 at 9:42 AM, Robert Butts < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > >can you send a pointer to any documentation/text available > > regarding > > > > > > 'Stats over HTTP' ? > > > > > > > > > > > > https://docs.trafficserver.apache.org/en/latest/admin-guide/ > > > > > > plugins/stats_over_http.en.html#admin-plugins-stats-over-http > > > > > > > > > > > > https://github.com/apache/trafficserver/tree/c7cfa081ccc5c0b > > > > > > 875a8ee665cde3399c48abd5c/plugins/stats_over_http > > > > > > > > > > > > It's essentially the "official" version of "Astats". In a > nutshell, > > > > > Astats > > > > > > is a fork of that, and adds the "System" stats key. It's possible > > to > > > > use > > > > > > the unmodified `stats_over_http`, while extending it to add the > > > > "System" > > > > > > stats Traffic Control needs. Which we're looking into doing, > rather > > > > than > > > > > > maintaining a completely separate fork. But, as above, it may be > > > > > difficult > > > > > > to extend either to have the DS Name. Hopefully I'm wrong. > > > > > > > > > > > > > > > > > > On Tue, Jan 2, 2018 at 9:35 AM, Robert Butts < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > >> >I would like to avoid adding additional configurations to a > > system, > > > > in > > > > > >> cases where the system can auto-detect the proper configuration. > > > > > >> > > > > > >> But the system can't, in this case. Detecting a single field as > > > "must > > > > be > > > > > >> the DS Name" makes single-domain hosts impossible (such as > > > > `localhost`). > > > > > >> Single-domain names are perfectly valid within an internal > > network, > > > we > > > > > >> shouldn't make that configuration impossible. > > > > > >> > > > > > >> I agree, minimizing configuration is ideal, but in this case, it > > > isn't > > > > > >> possible without making certain things mysteriously impossible; > > and > > > > > that's > > > > > >> going to be a very confusing bug for someone to track down, if > > they > > > > > need it. > > > > > >> > > > > > >> Moreover, that configuration does exist, whether we try to > > magically > > > > > >> detect it or not. It is a property of the cache. And that's what > > > > Traffic > > > > > >> Ops is for -- configuration data about the cache should be > stored > > in > > > > TO. > > > > > >> Not having it there is going to make Operations' job more > > difficult, > > > > > >> because now they can't easily see what type of Stats a cache is > > > > serving. > > > > > >> For example, what if there's a bug and the Monitor is inferring > > the > > > > > wrong > > > > > >> type? Now it's a nightmare for Ops and Dev to track down. > > > > > >> > > > > > >> I support what you want to accomplish, supporting multiple DSes > > with > > > > the > > > > > >> same regex; and it's expensive to parse the domain anyway, it's > > > ideal > > > > to > > > > > >> have the DS name in the stat. But unfortunately, it isn't > possible > > > to > > > > > >> automatically infer without making certain configurations > > > impossible, > > > > > and > > > > > >> failing to store the Stats type configuration (which exists, > > whether > > > > we > > > > > >> store the data or not) in TO is going to lead to pain. > > > > > >> > > > > > >> >I do think it would be nice to have Traffic Monitor support > some > > > sort > > > > > of > > > > > >> canonical stats interface > > > > > >> > > > > > >> I have mixed feelings. On one hand, it'd be simpler, if there > were > > > > "One > > > > > >> Right Way" with all the things we want/need. But on the other > > hand, > > > > that > > > > > >> would make it painful to use Caches with Traffic Control which > > don't > > > > > have > > > > > >> extensible stats. It'd mean users would have to set up an > > > intermediary > > > > > >> service (for every single Cache) to transform the Stats, for > Cache > > > > > >> applications which aren't sufficiently configurable. > > > > > >> > > > > > >> Maybe it'd be better make the Monitor able to easily be > extended, > > > > > similar > > > > > >> to adding an endpoint in Golang TO, simply by adding a function > > in a > > > > new > > > > > >> file, and adding that func to a "Server Cache Type" map? That > > > solution > > > > > >> seems fairly easy, and much easier for users to add custom stats > > > than > > > > > >> having to write and deploy an intermediary service. It will also > > be > > > > > safe, > > > > > >> if we make the TO database field non-nullable and default to the > > > > current > > > > > >> type. > > > > > >> > > > > > >> > > > > > >> Also be aware @orens that raw stats are part of the > > > > /publish/CacheStats > > > > > >> API in Traffic Monitor. Changing the Stats format will > > fundamentally > > > > > change > > > > > >> that API endpoint (unless you also write the Monitor code to > > rebuild > > > > the > > > > > >> domain, as inserted into the `Result.Astats.Ats` and > > > > `ResultStatHistory` > > > > > >> objects). I'm not sure if anything currently uses that (Router? > > > > Traffic > > > > > >> Stats?), but anything that does will break. Just something to be > > > aware > > > > > of. > > > > > >> > > > > > >> > > > > > >> On Sun, Dec 31, 2017 at 1:35 AM, Oren Shemesh <[email protected]> > > > > wrote: > > > > > >> > > > > > >>> Thanks Ryan, Robert and Dave for your responses. > > > > > >>> > > > > > >>> Dave, can you send a pointer to any documentation/text > available > > > > > >>> regarding > > > > > >>> 'Stats over HTTP' ? > > > > > >>> > > > > > >>> I believe I can consider the overall response to the change I > am > > > > > >>> proposing > > > > > >>> as positive :-) > > > > > >>> > > > > > >>> Robert, from my experience I would like to avoid adding > > additional > > > > > >>> configurations to a system, in cases where the system can > > > auto-detect > > > > > the > > > > > >>> proper configuration. > > > > > >>> Any configuration added is another chance for someone making a > > > > mistake. > > > > > >>> > > > > > >>> With the proposal I have made, so configuration is needed: TM > > > > > >>> auto-detects > > > > > >>> the type of astats information (DS name / entire DS domain) > based > > > on > > > > a > > > > > >>> single 'if' statement which queries a single scalar value. > > > > > >>> This also means that, assuming my proposed code change is > valid, > > > > there > > > > > >>> can > > > > > >>> be no performance issues. > > > > > >>> > > > > > >>> Also, I do not know what 'Stats over HTTP' is, but I hope it > > keeps > > > > the > > > > > >>> job > > > > > >>> of classifying Client HTTP transactions to a DS in the cache > (ATS > > > in > > > > > our > > > > > >>> case), avoiding the need to re-classify it in TM. > > > > > >>> > > > > > >>> (Note: In my mind, the fact that TM currently does this job, > > based > > > > on a > > > > > >>> *single* hostName regex per DS, makes the entire feature of > > having > > > > > >>> multiple > > > > > >>> regexps per DS very limited in use. Which is a shame, based on > > the > > > > > effort > > > > > >>> invested in the DB model, TO API, and TR code). > > > > > >>> > > > > > >>> Thanks a lot guys ! > > > > > >>> > > > > > >>> > > > > > >>> On Sat, Dec 30, 2017 at 7:33 PM, Dave Neuman < > [email protected]> > > > > > wrote: > > > > > >>> > > > > > >>> > Hi Oren, > > > > > >>> > Sorry for the slow response, I think what you are proposing > > > sounds > > > > > >>> > reasonable, so +1. I would, however, like to make sure that > we > > > > > aren't > > > > > >>> > affecting the performance of TM too much in terms of both CPU > > > usage > > > > > >>> (with > > > > > >>> > 1000+ caches) and time to poll all caches/aggregate > statistics. > > > > > >>> > > > > > > >>> > Additionally, I don't think we should be adding new > > functionality > > > > to > > > > > >>> astats > > > > > >>> > at all. The plan is to move towards the Stats Over HTTP > plugin > > in > > > > the > > > > > >>> > future, so we should focus on making Traffic Monitor do what > we > > > > need > > > > > >>> it to > > > > > >>> > do now, and add new functionality to Stats over HTTP if it > > > doesn't > > > > > >>> > currently exist. I do think it would be nice to have Traffic > > > > Monitor > > > > > >>> > support some sort of canonical stats interface instead of > > > > > >>> > astats/stats_over_http, and I am sure there are several other > > > > > features > > > > > >>> > would be nice to have as well, but I don't think we should be > > > > > muddying > > > > > >>> the > > > > > >>> > waters with that right now. > > > > > >>> > > > > > > >>> > > > > > > >>> > On Wed, Dec 27, 2017 at 12:24 PM, Robert Butts < > > > > > >>> [email protected]> > > > > > >>> > wrote: > > > > > >>> > > > > > > >>> > > I'd be +1 if we extend Astats/stats_over_http to have the > DS > > > > name. > > > > > >>> > > > > > > > >>> > > It's easy to extend the Monitor to get the name from a > field, > > > if > > > > > it's > > > > > >>> > > available. And it should be faster than the current method. > > > But, > > > > > >>> Astats > > > > > >>> > > doesn't currently support it, and in fact, I don't believe > > ATS > > > > has > > > > > a > > > > > >>> good > > > > > >>> > > way to associate an identifier (DS name) with a Remap Rule. > > Can > > > > > >>> anyone > > > > > >>> > > confirm? > > > > > >>> > > > > > > > >>> > > That said, if we're extending `stats_over_http`, I'd really > > > like > > > > to > > > > > >>> see > > > > > >>> > it > > > > > >>> > > extended to serve CSV (ideally with the DS name in a field, > > as > > > > > >>> above), > > > > > >>> > when > > > > > >>> > > the request has an `Accept: text/csv` header. Astats only > > has a > > > > > >>> single > > > > > >>> > > level of JSON, and they're all numbers. So there's no > reason > > to > > > > use > > > > > >>> JSON > > > > > >>> > > over CSV, and CSV is drastically simpler and faster to > parse. > > > > > >>> Especially > > > > > >>> > if > > > > > >>> > > we break each `.` component of the stat into its own CSV > > field. > > > > The > > > > > >>> > Monitor > > > > > >>> > > is CPU-bound, so there's a good chance we'd see a > noticeable > > > > > >>> performance > > > > > >>> > > improvement, which translates to a tangible poll time > > > > improvement. > > > > > >>> > > > > > > > >>> > > Also @orens if we can't reasonably extend ATS Astats, we > can > > > > still > > > > > >>> > support > > > > > >>> > > your use case for your own cache, by having a separate code > > > path > > > > in > > > > > >>> the > > > > > >>> > > Monitor for your "stats", and adding a "stats type" number > or > > > > enum > > > > > >>> to the > > > > > >>> > > Server table (or ideally a new table for "cache servers", > > since > > > > it > > > > > >>> only > > > > > >>> > > applies to caches). So, the Monitor would look at that > field > > > > > >>> (possibly > > > > > >>> > > adding it to `monitoring.json` or `CRConfig.json`), to > > > determine > > > > > how > > > > > >>> to > > > > > >>> > > parse a given server's stats. > > > > > >>> > > > > > > > >>> > > We've discussed a "stat version" field anyway, for > different > > > > Astats > > > > > >>> > > versions. For example, we're looking into combining the > > System > > > > > stats > > > > > >>> into > > > > > >>> > > the Ats key, and we'd also like to be able to experiment > with > > > > > >>> different > > > > > >>> > > formats (CSV, Protobuf, etc). I think it's a good feature > for > > > the > > > > > >>> CDN in > > > > > >>> > > general, to make stat polling more flexible. > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > On Wed, Dec 27, 2017 at 12:21 PM, Durfey, Ryan < > > > > > >>> [email protected]> > > > > > >>> > > wrote: > > > > > >>> > > > > > > > >>> > > > Oren, > > > > > >>> > > > > > > > > >>> > > > This is a great conversation to have. I can’t speak to > the > > > best > > > > > >>> > solution, > > > > > >>> > > > but allowing our servers to have a single SSL cert that > > could > > > > be > > > > > >>> shared > > > > > >>> > > > across delivery services would save us a lot of time, > > money, > > > > and > > > > > >>> > > > operational resources. From a product perspective this > > would > > > be > > > > > >>> highly > > > > > >>> > > > beneficial to the platform especially since all services > > are > > > > > >>> eventually > > > > > >>> > > > moving to SSL. > > > > > >>> > > > > > > > > >>> > > > Key Benefits: > > > > > >>> > > > > > > > > >>> > > > 1. Instant Service Builds: All new services could use > > SSL > > > > > >>> > immediately, > > > > > >>> > > > no 2-3 business day delays in ordering SSL certs. > > > > > >>> > > > * This gets us a step closer to instant SSL > service > > > > builds > > > > > >>> > without > > > > > >>> > > > the need to implement something like “letsencrypt”. > > > > > >>> > > > 2. Operations Load: Reducing the management of a > > constant > > > > flow > > > > > >>> of > > > > > >>> > > > orders and renewals for SSL certs would save operations > > > > resources > > > > > >>> and > > > > > >>> > > time. > > > > > >>> > > > 3. Cost Reduction: A reduction in SSL certs equates to > > > > > >>> reduction in > > > > > >>> > > > purchase and renewal costs. > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > Ryan Durfey M | 303-524-5099 > > > > > >>> > > > CDN Support (24x7): 866-405-2993 or > > [email protected] > > > > > <mailto > > > > > >>> : > > > > > >>> > > > [email protected]> > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > From: Oren Shemesh <[email protected]> > > > > > >>> > > > Reply-To: "[email protected]" < > > > > > >>> > > > [email protected]> > > > > > >>> > > > Date: Wednesday, December 27, 2017 at 2:49 AM > > > > > >>> > > > To: "[email protected]" < > > > > > >>> > > > [email protected]> > > > > > >>> > > > Subject: Two different Delivery Services using the same > > host > > > > > >>> regexp: A > > > > > >>> > > > suggested Traffic Monitor code change > > > > > >>> > > > > > > > > >>> > > > Hi TC Dev people, > > > > > >>> > > > > > > > > >>> > > > We would like to avoid having to issue a specific SSL > > > > certificate > > > > > >>> for > > > > > >>> > > every > > > > > >>> > > > DS, as is needed today. > > > > > >>> > > > > > > > > >>> > > > (Reminder: Every DS needs a separate certificate because > > The > > > > > >>> domains > > > > > >>> > used > > > > > >>> > > > by the DS are tr.<ds-name>.<cdn-domain> and > > > > > >>> > > > <cache-name>.<ds-name>.<cdn-domain>, and wild-card certs > > > > support > > > > > >>> only > > > > > >>> > a > > > > > >>> > > > single *, so every cert is issued to > > > *.<ds-name>.<cdn-comain>, > > > > > >>> which > > > > > >>> > > makes > > > > > >>> > > > it DS-specific). > > > > > >>> > > > > > > > > >>> > > > When configuring two different DSes to use the same host > > > > regexp, > > > > > >>> and > > > > > >>> > > > configuring additional path regexps to differentiate > > between > > > > > them, > > > > > >>> it > > > > > >>> > is > > > > > >>> > > > possible to overcome this issue. > > > > > >>> > > > > > > > > >>> > > > For example: DS 'my-ds1' can have host regexp > > .*\.my-ds\..* , > > > > and > > > > > >>> path > > > > > >>> > > > regexp /ds1/.* (Both with order 0), and DS 'my-ds2' has > the > > > > same > > > > > >>> host > > > > > >>> > > > regexp and path regexp /ds2/.*. > > > > > >>> > > > > > > > > >>> > > > This makes TR redirect requests of the form > > > > > >>> > > > tr.my-ds.<cdn-domain>/ds1/<whatever> to > > > > > >>> > > > <cache-name>.my-ds.<cdn-domain>/ds1/<whatever>, do the > > same > > > > for > > > > > >>> > > > tr.my-ds.<cdn-domain>/ds2/<whatever>, and respond with > > '503 > > > > > >>> service > > > > > >>> > > > unavailable' to requests to tr.my-ds.<cdn-domain> which > do > > > not > > > > > >>> start > > > > > >>> > with > > > > > >>> > > > /ds1/ or /ds2/. > > > > > >>> > > > > > > > > >>> > > > So with TR, So far so good. > > > > > >>> > > > > > > > > >>> > > > ATS can also differentiate between the DSes based on the > > > path, > > > > so > > > > > >>> there > > > > > >>> > > is > > > > > >>> > > > no problem there as well. > > > > > >>> > > > > > > > > >>> > > > However, there is a problem with Traffic Monitor: > > > > > >>> > > > > > > > > >>> > > > Because astats sends per-DS info based on the origin > name, > > TM > > > > > >>> requires > > > > > >>> > > that > > > > > >>> > > > every DS has a unique host regexp (See > > > > > >>> > > > https://github.com/apache/incubator-trafficcontrol/blob/ > > > > > >>> > > > 3e3a5f41d482a065f46ef287b347e6 > 6d7d205d82/traffic_monitor/ > > > > > >>> > > > todata/todata.go#L234 > > > > > >>> > > > ). > > > > > >>> > > > This is so that TM can map origin names from astats back > > to a > > > > DS. > > > > > >>> > > > > > > > > >>> > > > So, I would like to make a small change to the TM code: > > > > > >>> > > > > > > > > >>> > > > If the astats info contains only a single host name, > assume > > > it > > > > is > > > > > >>> > already > > > > > >>> > > > the DS name. > > > > > >>> > > > > > > > > >>> > > > (This would be implemented in <goog_1101383350> > > > > > >>> > > > https://github.com/apache/incubator-trafficcontrol/blob/ > > > > > >>> > > > 3e3a5f41d482a065f46ef287b347e6 > 6d7d205d82/traffic_monitor/ > > > > > >>> > > > cache/cache.go#L419 > > > > > >>> > > > : > > > > > >>> > > > > > > > > >>> > > > if len(statParts) == 2 { // [0] is DS name, [1] is > statName > > > > > >>> > > > ds = statParts[0] > > > > > >>> > > > } > > > > > >>> > > > else { > > > > > >>> > > > ... existing code which uses > statsParts[0:len(statsParts)- > > 1] > > > > > >>> > > > } > > > > > >>> > > > > > > > > >>> > > > Obviously, the ds should be checked against the list of > all > > > > DSes > > > > > to > > > > > >>> > make > > > > > >>> > > > sure it is valid, I wrote this code line just to > > demonstrate > > > > the > > > > > >>> idea). > > > > > >>> > > > > > > > > >>> > > > Of course, the astats plugin for ATS might not support > this > > > > today > > > > > >>> (I am > > > > > >>> > > not > > > > > >>> > > > sure), but a different cache (Hint: Qwilt cache) has no > > > problem > > > > > >>> > reporting > > > > > >>> > > > the astats information per DS name. > > > > > >>> > > > If needed, the astats code could be improved to do that, > as > > > > well. > > > > > >>> > > > > > > > > >>> > > > I believe that this change, while being very small and > > > > > unobtrusive, > > > > > >>> > would > > > > > >>> > > > make the information passed from caches to TM 'cache > vendor > > > > > >>> agnostic', > > > > > >>> > > > while also eliminating the need to re-classify URLs into > DS > > > > > (Which > > > > > >>> is > > > > > >>> > > what > > > > > >>> > > > happens today in TM), hence it is a good step in a good > > > > > direction. > > > > > >>> > > > > > > > > >>> > > > Any comments / opinions, please ? > > > > > >>> > > > > > > > > >>> > > > -- > > > > > >>> > > > > > > > > >>> > > > *Oren Shemesh* > > > > > >>> > > > Qwilt | Work: +972-72-2221637 <+972%2072-222-1637>| > Mobile: > > > > > >>> +972-50-2281168 <+972%2050-228-1168> | > > > > > >>> > [email protected] > > > > > >>> > > > <mailto:[email protected]> > > > > > >>> > > > <[email protected]<mailto:[email protected]>> > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> -- > > > > > >>> > > > > > >>> *Oren Shemesh* > > > > > >>> Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | > > > > > [email protected] > > > > > >>> <[email protected]> > > > > > >>> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > *Oren Shemesh* > > > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | > > [email protected] > > > > > > > > > > > > > > > -- > > > > *Oren Shemesh* > > Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected] > > > -- *Oren Shemesh* Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected]
