[Beowulf] Re: OT: informatics software for linux clusters

Joe Landman Mon, 15 May 2006 13:25:25 -0700


David Mathog wrote:

Scalable Informatics has released Scalable HMMer, an optimizedversion of HMMer 2.3.2 that is 1.6-2.5x faster per node on benchmarktests run on Opteron systems.
Did you remove the memory organization changes SE put in to make
it run better on the Altivec Macs?  Those really made life hard when I
was trying to optimize this code to run


Hi Dave:

We didn't start from the Altivec patch. It is in a large "ifdef" infast_algorithms.c. I didn't see memory organization changes in thenon-altivec code (though there was a line about some issue with theIntel compilers).

We started from the base p7Viterbi in fast_algorithms, and rewrotethe loops a bit.

on our Beowulf with Athlon MP processors.  The problem was the
P7Viterbi data structures didn't fit entirely into cache (no matter

I was worried about cache thrashing (and still am) with our changes.The code isn't complex, but the particulars of the originalimplementation weren't terribly cache friendly.

how it was organized) and this resulted in toxic query lengths that ran
several times slower.  That is, take a query sequence
of length 1000, run hmmpfam, nip off the last character, run it again,
etc.  It was anything but a smooth function of execution time vs. query

Ohhh.... I would love a test like that. Is this something that youfound in general with the baseline code or with the Altivec'ed code?This would be very good to include in our regression testing...

length.  Working around the Altivec stuffed helped some but didn't
entirely eliminate the effect.  Probably the bigger cache on the
Opteron would eliminate this effect for smaller sequences but I'm
guessing you could still run into it with a long query.

We ran an 8000 letter query length as our longest test. If you havesome specific test cases which exercise bugs, please let me know whatthey are and I will see if we can use them.


This has nothing to do with the Parallel implementation though, it
was a data size vs. cache size effect.

That is an issue with this code. The Athlon has a 256k L2 last Iremember, and a 128k L1. Rather hard to keep lots of stuff in cache.

Right now the big issue we are running into for another aspect of thisproject is the lack of a vector max/min function in SSE*. (If anyonefrom AMD/Intel is listening, this is a *big* issue, and I even have arough idea how to do it "quickly" in SSE at the expense of many SSEregisters.

Joe


Regards,

David Mathog
[EMAIL PROTECTED]
Manager, Sequence Analysis Facility, Biology Division, Caltech


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Re: OT: informatics software for linux clusters

Reply via email to