I implemented a direct mode. It writes out each batch of slots as soon as it gets them. Any sort options are ignored. There will be duplicates of any slots that get updated after they are retrieved. I think the filtering stuff should still work but I didn't try it.
The code and UI need more work, but as a proof of concept it managed to capture everything from a busy server. I think collecting data from a busy server will always be "interesting". I know about 2 issues. The first is the race between collecting data and having slots get moved or recycled while you are collecting. This is obviously easier if you can run on the same system as the server so there are no network delays. If we can't go fast enough, we should be able to get some of the data and/or some estimates of how much we are missing. We can probably test that by running over a network. (That will also test the lost packet code.) We need to be sure to debug this case/mode so we will have useful tools when the next big burst of traffic hits the pool. The other issue is memory and CPU on the system collecting the data. I don't know which limit will kick in first. It takes a lot of CPU, but that's not a problem as long as you can keep up with the server. I think that translates into a threshold for how busy a server you can grab complete data from. I think memory will be a serious issue. I saw troubles before switching to direct mode but it should work on a system with more memory or less traffic. Direct mode doesn't use much memory so this probably won't be a problem. My reference is a pool server in the cloud. For $5 per month you get 512 megabytes. I had 150 megabytes allocated to the MRU list. That's about a million slots. I had the pool bandwidth adjusted so that covered well over a day. I was grabbing data with a script that ran once a day from a cron job. The old c code worked before the recent burst of pool traffic. It didn't work during the burst. The new python code got tangled up with the burst so I don't know how well it would have worked before the burst. I think it would have run out of memory. I don't have a cron job working yet. --------- Any suggestions for a UI/CLI? Currently, the direct command sets a flag that gets passed down similar to the hostnames flag. That seems pretty ugly to me, mostly because it gets used in several places rather than only one as with hostnames. (The hostnames stuff seems ugly too, but direct is uglier.) Maybe it should be a separate program. I'm assuming that mode will mostly be used from a cron job. I'd like to also collect statistics on the data collection process. How many retransmissions and such. Maybe that should go to syserr? Maybe a command line switch? We could implement another stats file and have the server write stuff when slots are recycled. That might need some rate limits. This feels like the sort of problem that has a cliff rather than a slope. -- These are my opinions. I hate spam. _______________________________________________ devel mailing list [email protected] http://lists.ntpsec.org/mailman/listinfo/devel
