On Thu, Aug 06, 2009 at 10:38:26AM -0700, Darren Reed wrote: > Jens Elkner wrote: Hi Darren,
> >as previously written, we 've seen a lot of problems / unavalibility / > >strange behavior wrt. our Internet aka more or less public related > >services running on Solaris machine after upgrading to S10u4 or higher. > Firstly, this mailing list is for OpenSolaris, NOT Solaris 10. > The correct forum for you to express your grievences is via > the support channel as you appear to have done but it seems > like you didn't like the answers you received... Sorry, that you completely missed the point. It IS ALL about OpenSolaris, Nevada and Solaris! It is about reasonable defaults, it is about none-existing/bad documentation, it is about missing features, it is about DoS, it is about usebility, it is about user/customer satisfaction ... Common, do you really think, that wrt. this issue it makes a difference, whether I use OpenSolaris or Nevada? Does ipfilter v1.4.9 on Nevada/Opensolaris works different than ipfilter v1.4.9 on Solaris? Not sure, whether you know, what Sun ambassadors and representatives are telling one in the wild out there. I'm hearing all the time: what is not in Nevada/OpenSolaris you'll certainly not see in Solaris. And as you can see, opening a case is the worst thing to give a feedback/asking for improvements/features ... Taking that and the fact, that there is no "Bugzilla" for Solaris what should Solaris users do? Well, most students I've talked to, which gave OpenSolaris/Nevada a try said, that it isn't ready for desktop and silently installed/switched back to Linux... Maybe that's the way to go. BTW: If this trend continues and even people posting its concerns are discouraged to do that, I guess, sometimes in the future, the Solaris team is able to take care of each customer personally ... > >So the sun case engineer explained, if ipf can not insert an entry into > >the state table, it just _continues_ evaluating the rules that follow. > >I couldn't believe my eyes!!! What a crap!!! > > Well, what would you have it do? As said, at least notify the user about the problem. Since I'm not an ipf specialist, I'm not sure, what is the right thing to further do here. My guts say, drop and stop processing other rules. Another way could be, to lower the TTL of all entries in steps of N hours, cleanup, and check, whether now the insert succeeds. This implies of course a new threshold to be set by the OS/user and a user notification, that this measure has been taken (documentation about what that means would be probably a nice thing, too) ... Anyway, no matter what decision gets made, it needs to be documented, that this may happen, that the system does not inform the user about this problem, what the developers have choosen to do in that case, how one may get around this problem (e.g. rising fr_statemax) and what implication this change would have on the system. > It is not appropriate to have ipfilter automatically > grow the tables as you suggest through "automatic > tuning" because then your system becomes vulnerable > to a denial of service attack from remote attackers. Isn't it already vunerable to DoS, when the state table is too small? Actually I've checked all of our production servers (most of them already have fr_statemax=40129) and all show paket lost > 100, usually > 5000 even on very very low traffic machines and who wonders, of course on svn machines as well. So this implies, that's something wrong with ipf or at least with its default settings... > If you're not aware of how much network traffic is > going through your firewall and you're not paying > attention to what your firewall is doing then you've > got more work to do in order to manage the system > effectively. Well, I think I've a more or less good understanding about what's going on on the machines. However, I'm neither a firewall specialist nor ipf developer and I even didn't expect a firewall to do silently things, which it shouldn't. Actually can only rely on the things, which are documented, and you probably know, there is a lot room for improvements: E.g. It is documented, how one can obtain certain ipf statistics, but nowhere, what these stats actually telling you/how to interprete those stats. BTW: some Sun ambassadors/sales representatives say, that Solaris is not ment to be a desktop OS, since one needs to be a Solaris expert to handle it. So if you are not a Solaris expert, rather go with Linux. It seems, you belong to that group, suggesting this.? BTW2: If I think more about this, if everybody would take this by word at our university, there wouldn't be any solaris machine at the campus anymore (and yes, there is indeed a strong tendence to switch over to linux - even servers, if not already done ...) > ... > >Wrt. a required syslog message he respond, that a counter increment > >(ipfstat: packet state*lost) costs 2 cycles on sparc, only, but a syslog > >message 2000 cycles and would cause ipf to "hang"/be unusable, and closed > >the case. > > syslog message from where? Ehmmm, I'm not a kernel developer and actually didn't care about it yet. What I know is, that e.g. on linux there is a klogd ... > Generating messages from within kernel modules > is generally frowned upon. OK. So hiding kernel problems is a better thing? What is so hard, to increment a wellknown value in the kernel and let a logger in the user space poll for changes every n timeunits? Also not sure, wheter a new thing needs to be invented for this: If IIRC ipmon is already able to log ipf related stuff (but no documentation, if this is a bad thing to use, because of possible performance degration ???) ... > >3) Actually I would expect, that there is some kind of global SW > > register (perhaps called log indicator table), where one could > > also add a "state table full counter", which gets incremeted by > > ipf and I assume, > > Something like that exists. So the only thing one needs for a more userfriendly system, is something, that reads this stuff and bummers a message like "ipf: state table full. kernel param fr_statemax possibly to low" If I would have found such a message in the logs, it is quite possible, that by gooling for 'fr_statemax' I would have found at least http://docs.hp.com/en/B9901-90042/apcs03.html and thus probably didn't need almost 2 years to get a more or less proper workaround for the problem. > >that there is also a kernel log daemon or even user > > space logger, which can read this table in certain intervalls and > > log the problem. So the 2 vs. 2000 cycle reason for making Solaris > > users live harder than necessary is IMHO a very poor one / implies > > a not very well thought SW design (at least in the eyes of a normal > > human beeing ;-) having not much ipf insights because of shallow > > documentation). > > Usually, an admin always looks in /var/adm/messages first, if the > > cause of a problem can not be determined/is not really reproducable. > > /var/adm/messages isn't generally considered an "interface". Well, almost every *x admin is looking into /var/{log|adm}/messages when he encounters an unknown problem ... I'm not sure, whether they care, whether it is called an 'interface', 'good practice' or 'pseudo standard' etc. ... > Whilst we can send messages to/via it, we are not allowed to > rely on them as being the only communication channel with the > systems administrator. OK. But did I say that? > > So is this, because at some time the state table was full, or is > > this, because ipf tried to insert a state, which is already present > > in the table? > > It is because it tried to insert a state entry to a table that is full. So all of our production machines (S10u7) even the low traffic ones and fr_statemax=40129 have a problem and 'keep state' needs to be considered harmful :((( > >Sure, ipf's behavior to processs the rules list like > > 'the rule is ignored' is for my taste more than a minor security > > issue, but anyway, should one rise fr_statemax the value to make > > it bigger and bigger 'til one finds out, that actually ipf is having > > a problem? And what is also not clear: are the 'lost' counters also > > snapshots (for what intervall/time), or is this an accumulation > > from when ipf got started/refreshed? > > It's part of something much more than that. > > What it allows is for you to create "keep state" rules > that define a maximum number of states allowed for them > and when that maximum is reached for other rules to be > then applied to packets. The problem of when the global > maximum is reached is a degenerative case of that. Not sure, whether I understood that correctly :( > ... > >BTW: Why shows 'ipfstat -t' so many entries with negative ttls. It > > appears, that if the min? value of -59:-59 is reached, ttl gets > > reset to 0:00 and restarts decrementing it ... - strange > > That's a known bug... fixed in the current opensolaris > source code tree and will be fixed in the next release. > A fix is also being considered for Solaris 10. So is it just a kind of overflow or does ipf is holding state entries for much longer than it needs to? Thanx for taking the time to answer, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 _______________________________________________ networking-discuss mailing list networking-discuss@opensolaris.org