On 07/07/11 00:08, Sébastien Boisvert wrote:
>>> Color-space is not necessary I think,
>>> m_parameters->getColorSpaceMode does that already.
>> But then you can't do nifty tricks like matching colour-space to
>> base-space, which *can* be done by using a different k-mer format.
> What do you mean exactly here ?

I was thinking of a situation where you might have k-mers stored as both 
colour-space and base-space. Upon reflection, I realised that there's 
really no point in this, and everything should just be stored as 
colour-space. If you want a strict comparison (i.e. the same as matching 
in base-space), then you enforce a first-base for every k-mer.

> I don't see the point of doing checksums for k-mers because the only data that
> are communicated transit with the message-passing interface. And I believe 
> the underlying
> bit transfer layers (TCP, Infiniband, or another one) already verify data 
> integrity.

Either a checksum or a 'this sequence is invalid' bit would be useful, I 
think. This would allow functions that return k-mers to indicate that a 
mis-translation has occurred (e.g. adding edges to something with an 
unknown first base). My main reason for using a checksum was for 
ferreting out areas in the code that assumed a base-space format. It is 
also useful for finding code errors caused by writing outside the 
expected range, pointer problems, etc.. I think I've dealt with most of 
those now, so perhaps the processor overhead is not necessary.

>> So in positions 60-63 when using 1 64-bit number, positions 125-128
>> when using 2, etc.? That means the location of the flags is less easy
>> to determine. I suppose you could put them always in positions 60-63
>> (i.e. at the end of the first array entry), but that's pretty much the
>> same as positions 0-3.
> The location is easy to locate -- it starting bit is basically 2*kmerLength,
> assuming kmerLength+2<=MAXMERLENGTH.

This assumption is dangerous, or not appropriate, because the code 
allows for a k-mer length different from MAXKMERLENGTH, and for 
different k-mer lengths for different k-mers (e.g. kMerAtPosition in 
common_functions.cpp doesn't check to see if w matches a static width 
variable).

> I know that doing it this way would not break the code, I think you would 
> just need to change the hashing functions
> to reset  (set to 0) all the fields starting at 2*kmerLength in a Kmer.

Yes, that should work. There needs to be some thought about how to treat 
colour-space sequences with unknown first bases, though. Should they 
hash to the same position (possibly getting changed when/if the first 
base is known)? If they hash to different positions, what happens when 
you would be able to find out with high reliability what the first base 
of a k-mer should be?

Thanks for your help,

David

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to