On Thu, Aug 06, 2009 at 10:38:26AM -0700, Darren Reed wrote:
> Jens Elkner wrote:
Hi Darren,

> >as previously written, we 've seen a lot of problems / unavalibility /
> >strange behavior wrt. our Internet aka more or less public related
> >services running on Solaris machine after upgrading to S10u4 or higher.
  
> Firstly, this mailing list is for OpenSolaris, NOT Solaris 10.
> The correct forum for you to express your grievences is via
> the support channel as you appear to have done but it seems
> like you didn't like the answers you received...

Sorry, that you completely missed the point. It IS ALL about
OpenSolaris, Nevada and Solaris! It is about reasonable defaults,
it is about none-existing/bad documentation, it is about missing features,
it is about DoS, it is about usebility, it is about user/customer
satisfaction ...

Common, do you really think, that wrt. this issue it makes a difference,
whether I use OpenSolaris or Nevada? Does ipfilter v1.4.9 on
Nevada/Opensolaris works different than ipfilter v1.4.9 on Solaris?

Not sure, whether you know, what Sun ambassadors and representatives 
are telling one in the wild out there. I'm hearing all the time: what
is not in Nevada/OpenSolaris you'll certainly not see in Solaris.
And as you can see, opening a case is the worst thing to give a
feedback/asking for improvements/features ...

Taking that and the fact, that there is no "Bugzilla" for Solaris
what should Solaris users do? Well, most students I've talked to,
which gave OpenSolaris/Nevada a try said, that it isn't ready for
desktop and silently installed/switched back to Linux... Maybe that's
the way to go.

BTW: If this trend continues and even people posting its concerns are
discouraged to do that, I guess, sometimes in the future, the Solaris
team is able to take care of each customer personally ...

> >So the sun case engineer explained, if ipf can not insert an entry into
> >the state table, it just _continues_ evaluating the rules that follow. 
> >I couldn't believe my eyes!!! What a crap!!!
> 
> Well, what would you have it do?

As said, at least notify the user about the problem. Since I'm not
an ipf specialist, I'm not sure, what is the right thing to further 
do here. My guts say, drop and stop processing other rules.

Another way could be, to lower the TTL of all entries in steps of N hours,
cleanup, and check, whether now the insert succeeds. This implies of course
a new threshold to be set by the OS/user and a user notification, that this
measure has been taken (documentation about what that means would be
probably a nice thing, too) ...

Anyway, no matter what decision gets made, it needs to be documented,
that this may happen, that the system does not inform the user about
this problem, what the developers have choosen to do in that case, how
one may get around this problem (e.g. rising fr_statemax) and what
implication this change would have on the system.
  
> It is not appropriate to have ipfilter automatically
> grow the tables as you suggest through "automatic
> tuning" because then your system becomes vulnerable
> to a denial of service attack from remote attackers.

Isn't it already vunerable to DoS, when the state table is too small?

Actually I've checked all of our production servers (most of them
already have fr_statemax=40129) and all show paket lost > 100,
usually > 5000 even on very very low traffic machines and who wonders, of
course on svn machines as well. So this implies, that's something wrong
with ipf or at least with its default settings...

> If you're not aware of how much network traffic is
> going through your firewall and you're not paying
> attention to what your firewall is doing then you've
> got more work to do in order to manage the system
> effectively.

Well, I think I've a more or less good understanding about what's going
on on the machines. However, I'm neither a firewall specialist nor
ipf developer and I even didn't expect a firewall to do silently things,
which it shouldn't. Actually can only rely on the things, which are
documented, and you probably know, there is a lot room for improvements:
E.g. It is documented, how one can obtain certain ipf statistics, but
nowhere, what these stats actually telling you/how to interprete those
stats.

BTW: some Sun ambassadors/sales representatives say, that Solaris is
not ment to be a desktop OS, since one needs to be a Solaris expert
to handle it.  So if you are not a Solaris expert, rather go with Linux.
It seems, you belong to that group, suggesting this.?
BTW2: If I think more about this, if everybody would take this by word
at our university, there wouldn't be any solaris machine at the campus
anymore (and yes, there is indeed a strong tendence to switch over to
linux - even servers, if not already done ...)

> ...
> >Wrt. a required syslog message he respond, that a counter increment
> >(ipfstat: packet state*lost) costs 2 cycles on sparc, only, but a syslog 
> >message 2000 cycles and would cause ipf to "hang"/be unusable, and closed
> >the case.
> 
> syslog message from where?
  
Ehmmm, I'm not a kernel developer and actually didn't care about it yet.
What I know is, that e.g. on linux there is a klogd ... 

> Generating messages from within kernel modules
> is generally frowned upon.

OK. So hiding kernel problems is a better thing? What is so hard,
to increment a wellknown value in the kernel and let a logger in
the user space poll for changes every n timeunits? Also not sure, wheter
a new thing needs to be invented for this: If IIRC ipmon is already able
to log ipf related stuff (but no documentation, if this is a bad thing
to use, because of possible performance degration ???) ...
  
> >3) Actually I would expect, that there is some kind of global SW
> >   register (perhaps called log indicator table), where one could
> >   also add a "state table full counter", which gets incremeted by
> >   ipf and I assume,
> 
> Something like that exists.

So the only thing one needs for a more userfriendly system, is
something, that reads this stuff and bummers a message like 
"ipf: state table full. kernel param fr_statemax possibly to low"

If I would have found such a message in the logs, it is quite possible,
that by gooling for 'fr_statemax' I would have found at least 
http://docs.hp.com/en/B9901-90042/apcs03.html and thus probably didn't
need almost 2 years to get a more or less proper workaround for the
problem.

> >that there is also a kernel log daemon or even user
> >   space logger, which can read this table in certain intervalls and
> >   log the problem. So the 2 vs. 2000 cycle reason for making Solaris
> >   users live harder than necessary is IMHO a very poor one / implies
> >   a not very well thought SW design (at least in the eyes of a normal
> >   human beeing ;-) having not much ipf insights because of shallow
> >   documentation).
> >   Usually, an admin always looks in /var/adm/messages first, if the
> >   cause of a problem can not be determined/is not really reproducable.
> 
> /var/adm/messages isn't generally considered an "interface".

Well, almost every *x admin is looking into /var/{log|adm}/messages
when he encounters an unknown problem ... I'm not sure, whether they
care, whether it is called an 'interface', 'good practice' or
'pseudo standard' etc. ...

> Whilst we can send messages to/via it, we are not allowed to
> rely on them as being the only communication channel with the
> systems administrator.

OK. But did I say that?
  
> >   So is this, because at some time the state table was full, or is
> >   this, because ipf tried to insert a state, which is already present
> >   in the table?
> 
> It is because it tried to insert a state entry to a table that is full.

So all of our production machines (S10u7) even the low traffic ones
and fr_statemax=40129 have a problem and 'keep state' needs to be
considered harmful :(((

> >Sure, ipf's behavior to processs the rules list like
> >   'the rule is ignored' is for my taste more than a minor security
> >   issue, but anyway,  should one rise fr_statemax the value to make
> >   it bigger and bigger 'til one finds out, that actually ipf is having
> >   a problem? And what is also not clear: are the 'lost' counters also
> >   snapshots (for what intervall/time), or is this an accumulation
> >   from when ipf got started/refreshed? 
> 
> It's part of something much more than that.
> 
> What it allows is for you to create "keep state" rules
> that define a maximum number of states allowed for them
> and when that maximum is reached for other rules to be
> then applied to packets. The problem of when the global
> maximum is reached is a degenerative case of that.

Not sure, whether I understood that correctly :(

> ...
> >BTW: Why shows 'ipfstat -t' so many entries with negative ttls. It
> >     appears, that if the min? value of -59:-59 is reached, ttl gets
> >     reset to 0:00 and restarts decrementing it ... - strange
> 
> That's a known bug... fixed in the current opensolaris
> source code tree and will be fixed in the next release.
> A fix is also being considered for Solaris 10.

So is it just a kind of overflow or does ipf is holding state entries
for much longer than it needs to?

Thanx for taking the time to answer,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to