On 07/07/11 00:08, Sébastien Boisvert wrote: >>> Color-space is not necessary I think, >>> m_parameters->getColorSpaceMode does that already. >> But then you can't do nifty tricks like matching colour-space to >> base-space, which *can* be done by using a different k-mer format. > What do you mean exactly here ?
I was thinking of a situation where you might have k-mers stored as both colour-space and base-space. Upon reflection, I realised that there's really no point in this, and everything should just be stored as colour-space. If you want a strict comparison (i.e. the same as matching in base-space), then you enforce a first-base for every k-mer. > I don't see the point of doing checksums for k-mers because the only data that > are communicated transit with the message-passing interface. And I believe > the underlying > bit transfer layers (TCP, Infiniband, or another one) already verify data > integrity. Either a checksum or a 'this sequence is invalid' bit would be useful, I think. This would allow functions that return k-mers to indicate that a mis-translation has occurred (e.g. adding edges to something with an unknown first base). My main reason for using a checksum was for ferreting out areas in the code that assumed a base-space format. It is also useful for finding code errors caused by writing outside the expected range, pointer problems, etc.. I think I've dealt with most of those now, so perhaps the processor overhead is not necessary. >> So in positions 60-63 when using 1 64-bit number, positions 125-128 >> when using 2, etc.? That means the location of the flags is less easy >> to determine. I suppose you could put them always in positions 60-63 >> (i.e. at the end of the first array entry), but that's pretty much the >> same as positions 0-3. > The location is easy to locate -- it starting bit is basically 2*kmerLength, > assuming kmerLength+2<=MAXMERLENGTH. This assumption is dangerous, or not appropriate, because the code allows for a k-mer length different from MAXKMERLENGTH, and for different k-mer lengths for different k-mers (e.g. kMerAtPosition in common_functions.cpp doesn't check to see if w matches a static width variable). > I know that doing it this way would not break the code, I think you would > just need to change the hashing functions > to reset (set to 0) all the fields starting at 2*kmerLength in a Kmer. Yes, that should work. There needs to be some thought about how to treat colour-space sequences with unknown first bases, though. Should they hash to the same position (possibly getting changed when/if the first base is known)? If they hash to different positions, what happens when you would be able to find out with high reliability what the first base of a k-mer should be? Thanks for your help, David ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
