Well hunted, Kira. On 31/05/2017 16:17, Kira Mourao (Staff) wrote: > I’ve just found the reason the file I was looking at was not loading (or > actually is loading but extremely slowly). The bad news is it looks like > it’s been in the code since Aug 2016 but the good news is it looks very > fixable. :)
> The initialisation is being held up in > Alignment::resolveAndAddDatasetSeq which is in the call stack called by > the AlignFrame initialisation code. This was added to avoid duplicate sequence import when opening Ensembl or ENA CDS, if I remember correctly (though Mungo may have a better story). > The reason seqs.contains is slow is because, despite the name, > LinkedIdentityHashSet::contains is doing a linear search. This rather > echoes what I was saying earlier about checking our data structures are > appropriate. natch. > I’ll log a JIRA issue for this. It would be useful to know what the > purpose of using LinkedIdentityHashSet here was though, as this is the > only place it’s used in the code. The use of IdentityHash was to spot duplicates based on the Object reference (ie equivalence based on == rather than .equals() ). However, I'd have hoped the contains would not simply do linear search. ISTR a LinkedHashSet was chosen for order preservation, which made life easier for the CDS/Splitframe logic. Some relevant issues: JAL-2132, which may have been the original reason for this bit of logic back in 2016. That issue is overshadowed by the real requirement: full normalisation (JAL-407). I was idly googling IdentityHashMap to see if there are any workarounds. We could simply enforce primary keys and hash on those (SequenceI.getVamsasId() would fit that), but I also found this library https://bitbucket.org/trove4j/trove via http://java-performance.info/java-util-identityhashmap/. ..Jim. _______________________________________________ Jalview-dev mailing list [email protected] http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev
