Check out PackedSymbolList and the associated classes and interfaces
PackedSymbolListFactory, Packing, and Packing factory. These do bit
packing of
sequences. The nice part with these is they behave exactly like normal
SymbolLists so you don't even know your dealing with a compressed
sequence.
>From the java docs.
Example Usage
SymbolList symL = ...;
SymbolList packed = new PackedSymbolList(
PackingFactory.getPacking(symL.getAlphabet(), true),
symL
);
It is also relatively trivial to write a Huffman tree generator that can
compress SymbolLists as a binary string. You could use this as the bases
for full LZ compression. There are also very much more complicated
algorithms published that look for long range repeats, these are also very
slow.
- Mark
Felipe Albrecht <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED]
08/12/2005 04:07 AM
To: [email protected]
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] Compress Sequences.
Has some class in biojava that compress sequences?
For example, put four nucleotides in a single byte.
If dont exist, someone knows a good algorithm for compress, read and
compare this sequence?
Thanks.
Felipe Albrecht
_______________________________________________
Biojava-l mailing list - [email protected]
http://biojava.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - [email protected]
http://biojava.org/mailman/listinfo/biojava-l