Check out PackedSymbolList and the associated classes and interfaces PackedSymbolListFactory, Packing, and Packing factory. These do bit packing of sequences. The nice part with these is they behave exactly like normal SymbolLists so you don't even know your dealing with a compressed sequence.
>From the java docs. Example Usage SymbolList symL = ...; SymbolList packed = new PackedSymbolList( PackingFactory.getPacking(symL.getAlphabet(), true), symL ); It is also relatively trivial to write a Huffman tree generator that can compress SymbolLists as a binary string. You could use this as the bases for full LZ compression. There are also very much more complicated algorithms published that look for long range repeats, these are also very slow. - Mark Felipe Albrecht <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 08/12/2005 04:07 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Compress Sequences. Has some class in biojava that compress sequences? For example, put four nucleotides in a single byte. If dont exist, someone knows a good algorithm for compress, read and compare this sequence? Thanks. Felipe Albrecht _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l