----- Forwarded message from Kragen Javier Sitaker <[email protected]> -----

From: Kragen Javier Sitaker <[email protected]>
To: Joe Blaylock <[email protected]>
Subject: Re: reducing charset size for compressibility with case-shift 
characters (in Python)

On Sat, Apr 16, 2011 at 10:50:34AM -0700, Joe Blaylock wrote:
> On Sat, 2011-04-16 at 03:37 -0400, Kragen Javier Sitaker wrote:
> > lowercase = 'abcdefghijklmnopqrstuvwxyz'
> > numbers = '0123456789'
> > 
> >             else:
> >                 yield current_state[lowercase.index(char)]
> >         elif char == DC3:
> >             current_state = numbers
> 
> Couldn't you achieve a modest increase in compressibility at the expense of
> calculation time by representing all numerical sequences as base-26 encoded
> strings?

Quite possibly. In the Project Gutenberg Bible, that would make a
substantial fraction of the numbers one digit instead of two, or two
digits instead of three.

> You'd have to run a buffer large enough for any numeric runs you
> process, but the transformation itself is easy.  You couldn't do that
> nice direct-indexing thing any more though.  Well, not without
> creating more abstraction.

Indeed. May I forward this to kragen-discuss?

Kragen

----- End forwarded message -----
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to