Here's a solution:
The 32-bit version of the library
nss-mdns
needed to be installed (that package is lib32nss-mdns on Debian).
Now, why this would fix it, beats me. Is the JVM running in 32 bit mode
on this Opteron?
On another machine, a 64-bit Intel, only the 64 bit library is installed,
yet this problem doesn't occur.
And as to what recently changed to make it go haywire, also don't know.
Any ideas?
Cheers!
On 2.11.10, Steve White wrote:
> Hi again.
>
> Don't have the answer yet, but with the help of a professional programmer
> friend, I found the trigger. Setting the java.net property
> -Djava.net.preferIPv4Stack
> fixes it. Now, why the stack got changed, or why it wouldn't work with
> IPv6, I don't know yet.
>
> Cheers!
>
> On 1.11.10, Steve White wrote:
> > Hello, all,
> >
> > Further clues to this problem.
> >
> > It is not a Globus problem per se.
> > It is specific to Java programs on this host.
> > Other network programs work fine, the simplest Java Socket connection
> > fails, with DNS errors.
> >
> > The traceback showed calls to the Java Socket class, so I wrote a very
> > small test code -- see attached. It fails on the problematic system,
> > but works on other sytems.
> > javac nettest.java
> > java nettest your-favourite-hostname
> >
> > One difference I don't understand is that, although both machines are
> > running the same OS (SL 5.5) the strace on the machine that fails shows
> > calls mmap2, while on the other it shows calls to mmap. Now, the latter
> > is just a wrapper around the former so it shouldn't matter much.
> > The failing machine is AMD while the other is an Intel, so maybe that
> > accounts for it.
> >
> > strace output also attached.
> >
> > Any idea what would cause Java DNS to fail like this, when it is otherwise
> > working on a system?
> >
> > Thanks!
> >
> > On 29.10.10, Steve White wrote:
> > > Further info
> > >
> > > 1) I think I have eliminated the recent package updates as the source of
> > > the problem. I reversed the installs of the perl libraries, as well
> > > as some glibc libraries from a previous update, and re-booted with the
> > > previous kernel, yet the problem persists.
> > >
> > > 2) wsrf-query -a -z none -s \
> > > https://cashmere.aip.de:8443/wsrf/services/DefaultIndexService
> > > Error: ; nested exception is:
> > > java.net.UnknownHostException: cashmere.aip.de
> > >
> > > althogh cashmere is quite reachable from that host otherwise.
> > >
> > > 3) The problem is in name resolution.
> > > Using IP numbers in the above call works.
> > > Using the name of the local machine in the above call works.
> > >
> > > 5) wsrf-query just wrappers this java
> > > java -DGLOBUS_LOCATION=/usr/local/globus/gtk
> > > -Djava.endorsed.dirs=/usr/local/globus/gtk/endorsed
> > > -Djava.security.egd=file:///dev/urandom -classpath
> > > /usr/local/globus/gtk/lib/bootstrap.jar:/usr/local/globus/gtk/lib/cog-url.jar:/usr/local/globus/gtk/lib/axis-url.jar
> > > org.globus.bootstrap.Bootstrap org.globus.wsrf.client.Query -a -z none
> > > -s https://cashmere.aip.de:8443/wsrf/services/DefaultIndexService
> > >
> > > I ran this under
> > > strace -f
> > > Not sure what should be there. I see it opening and reading /etc/hosts
> > > but nothing that looks like querying DNS... I'm not sure how it goes
> > > about that.
> > >
> > > See attachment (lines with gettimeofday were removed)
> > >
> > > What should I see?
> > >
> > > Any ideas now?
> > >
> > >
> > > On 28.10.10, Steve White wrote:
> > > > Hi,
> > > >
> > > > Today we developed a new problem with our WebMDS page:
> > > > it now shows no hosts, where several appeared before.
> > > >
> > > > In the Globus 4.0.8 container.log I see lines like
> > > >
> > > > 471 WARN impl.AggregatorUtils
> > > > [ServiceThread-5,detectLoopback:65] UnknownHostException:
> > > > cashmere.aip.de
> > > > 478 ERROR impl.QueryAggregatorSource [Timer-3,pollGet:122]
> > > > Exception Getting Resource Property from
> > > > https://cashmere.aip.de:8443/wsrf/services/DefaultIndexService: ;
> > > > nested exception is:
> > > > java.net.UnknownHostException: cashmere.aip.de
> > > >
> > > > for each downstream host that previously reported to this host, and for
> > > > each upstream host this host reported to.
> > > >
> > > > However, from the current host, all the servers are quite visible, and
> > > > Globus is properly running on them.
> > > >
> > > > The only clue is, this behaviour started after a system update
> > > > (of Scientific Linux 5.5)
> > > >
> > > > Updated: python26-2.6.5-5.el5.x86_64
> > > > Updated: python26-libs-2.6.5-5.el5.x86_64
> > > > Installed: kernel-devel-2.6.18-194.17.4.el5.x86_64
> > > > Updated: kernel-headers-2.6.18-194.17.4.el5.x86_64
> > > > Installed: kernel-2.6.18-194.17.4.el5.x86_64
> > > > Updated: perl-NetAddr-IP-4.034-1.el5.rf.x86_64
> > > > Updated: perl-DBI-1.615-1.el5.rf.x86_64
> > > > Updated: syslinux-4.03-1.el5.rf.x86_64
> > > > Updated: perl-Crypt-OpenSSL-RSA-0.26-1.el5.rf.x86_64
> > > >
> > > > And there you see NetAddr-IP...
> > > >
> > > > Any ideas?
> > > >
> > | - - - - - - - - - - - - - - - - - - - - - - - - -
>
> > import java.net.Socket;
> >
> > public class
> > nettest
> > {
> > public
> > nettest( String host )
> > {
> > try
> > {
> > Socket si = new Socket( host, 80 );
> > System.out.println( "Success connecting to " + host );
> > }
> > catch( Exception e )
> > {
> > System.err.println( e.toString() );
> > }
> > }
> > public static void
> > main( String [] a )
> > {
> > if( a.length > 0 )
> > {
> > nettest bla = new nettest( a[0] );
> > }
> > else
> > {
> > System.err.println( "please provide host string" );
> > }
> > }
> > }
--
| - - - - - - - - - - - - - - - - - - - - - - - - -
| Steve White +49(331)7499-202
| E-Science Zi. 27 Villa Turbulenz
| - - - - - - - - - - - - - - - - - - - - - - - - -
| Astrophysikalisches Institut Potsdam (AIP)
| An der Sternwarte 16, D-14482 Potsdam
|
| Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
|
| Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
| - - - - - - - - - - - - - - - - - - - - - - - - -