[gentoo-user] Re: OT Best way to compress files with digits

2014-11-03 Thread Grant Edwards
On 2014-11-02, Matti Nykyri matti.nyk...@iki.fi wrote:
 On Nov 1, 2014, at 23:56, David W Noon dwn...@ntlworld.com wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
 (alan.mckin...@gmail.com) wrote about Re: [gentoo-user] Re: OT Best
 way to compress files with digits (in 545546d3.3030...@gmail.com):
 
 On 01/11/2014 19:59, meino.cra...@gmx.de wrote:
 [snip]
 Ah! By the way...I was astonished to read, that the digits of PI
 are called random on the one hand and on the other hand there is
 a formula [1] to calculate a certain digit of PI without
 calculation of the previous digits... Calculated random? Are
 nature constants the purest form of PRNGs ??? ;) (Quantum physics
 is everywhere... ;;))
 
 [1]:
 http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula
 
 
 The sequence of digits that make up pi are a random sequence - you
 can analyze the order any way you want and you'll find no inherent
 pattern.
 
 Actually, the sequence of digits is most definitely *not* random.  If
 the sequence of digits is written any other way then the value is not
 Pi.  Hence the sequence is unique, not random.
 
 I think what you are grasping for is that the frequency of distinct
 digits tends to be uniform: 0's occur as often as 1's as often ... as
 9's.  Note that the as often as operator is really approximate for

 Well all the digit of pi can be compressed to the following:

=pi();

Nah.  Just switch to base-Pi, and then it compresses to:

1

-- 
Grant Edwards   grant.b.edwardsYow! Are we THERE yet?
  at   
  gmail.com




Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-03 Thread Mick
On Sunday 02 Nov 2014 22:03:13 Peter Humphrey wrote:
 On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote:
  English is a heavily overloaded language and there's always more
  than one way to communicate something
 
 Even the simplest cases usually have three words for the same thing: one
 from French, one from Latin and one from Anglo-Saxon. I won't even mention
 words that have come down from Old German and so on, but at least we don't
 have many words from Italian or Spanish. (Zucchini? What's that?)

That's clearly baloney!

-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-03 Thread Peter Humphrey
On Monday 03 November 2014 19:37:52 Mick wrote:

  Even the simplest cases usually have three words for the same thing: one
  from French, one from Latin and one from Anglo-Saxon. I won't even mention
  words that have come down from Old German and so on, but at least we 
don't
  have many words from Italian or Spanish. (Zucchini? What's that?)
 
 That's clearly baloney!

Explain.

-- 
Rgds
Peter




Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-03 Thread Mick
On Tuesday 04 Nov 2014 02:04:45 Peter Humphrey wrote:
 On Monday 03 November 2014 19:37:52 Mick wrote:
   Even the simplest cases usually have three words for the same thing:
   one from French, one from Latin and one from Anglo-Saxon. I won't even
   mention words that have come down from Old German and so on, but at
   least we
 
 don't
 
   have many words from Italian or Spanish. (Zucchini? What's that?)
  
  That's clearly baloney!
 
 Explain.

http://en.wikipedia.org/wiki/Bologna_sausage

 :-)

-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-02 Thread Matti Nykyri
 On Nov 1, 2014, at 23:56, David W Noon dwn...@ntlworld.com wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
 (alan.mckin...@gmail.com) wrote about Re: [gentoo-user] Re: OT Best
 way to compress files with digits (in 545546d3.3030...@gmail.com):
 
 On 01/11/2014 19:59, meino.cra...@gmx.de wrote:
 [snip]
 Ah! By the way...I was astonished to read, that the digits of PI
 are called random on the one hand and on the other hand there is
 a formula [1] to calculate a certain digit of PI without
 calculation of the previous digits... Calculated random? Are
 nature constants the purest form of PRNGs ??? ;) (Quantum physics
 is everywhere... ;;))
 
 [1]:
 http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula
 
 
 The sequence of digits that make up pi are a random sequence - you
 can analyze the order any way you want and you'll find no inherent
 pattern.
 
 Actually, the sequence of digits is most definitely *not* random.  If
 the sequence of digits is written any other way then the value is not
 Pi.  Hence the sequence is unique, not random.
 
 I think what you are grasping for is that the frequency of distinct
 digits tends to be uniform: 0's occur as often as 1's as often ... as
 9's.  Note that the as often as operator is really approximate for
 finite sub-sequences, but is asymptotically accurate.
 
 Moreover, this is the same in any number base: the binary
 representation has 0's occurring as often as 1's; the ternary
 representation has 0's occurring as often as 1' and as often as 2's;
 etc., etc.
 
 Such numbers are called normal.  It was a poor choice of name, but
 we are stuck with it.  I would have called them digit soup numbers
 - -- an oblique reference to alphabet soup.

Well all the digit of pi can be compressed to the following:

=pi();

If you have the infinite series that calculates the digits :)

 However, any given digit in the sequence is 100% predictable, as
 you just showed :-)
 
 Randomness has got to be the second most mind-boggling thing out
 there, first being quantumness (that's not a waord, I just made it
 up. You you should get the meaning OK from context ;-) )
 
 I would say that probability theory is more mind boggling, as it
 underpins much of quantum theory.  But, as someone who majored in
 probability theory, I might be biased. [Incidentally, there is a small
 statistical joke in that last sentence.]
 
 Getting back to Meino's original request, one of the optimum
 compression algorithms for this would be custom Huffman encoding.  To
 do this the algorithm requires that all the data (i.e. digits) be read
 and a frequency table built.  The only problem is that to read all the
 digits of Pi could take rather a long time. ... :-)

That would take infinite time :)

 - -- 
 Regards,
 
 Dave  [RLU #314465]
 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
 dwn...@ntlworld.com (David W Noon)
 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
 
 iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+
 /xwAoK1qMgb9RZXkQByBUMqB8eqs20bG
 =XUPB
 -END PGP SIGNATURE-
 



Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-02 Thread Alan McKinnon
On 01/11/2014 23:56, David W Noon wrote:
 The sequence of digits that make up pi are a random sequence - you
  can analyze the order any way you want and you'll find no inherent
  pattern.
 Actually, the sequence of digits is most definitely *not* random.  If
 the sequence of digits is written any other way then the value is not
 Pi.  Hence the sequence is unique, not random.
 
 I think what you are grasping for is that the frequency of distinct
 digits tends to be uniform: 0's occur as often as 1's as often ... as
 9's.  Note that the as often as operator is really approximate for
 finite sub-sequences, but is asymptotically accurate.
 
 Moreover, this is the same in any number base: the binary
 representation has 0's occurring as often as 1's; the ternary
 representation has 0's occurring as often as 1' and as often as 2's;
 etc., etc.
 
 Such numbers are called normal.  It was a poor choice of name, but
 we are stuck with it.  I would have called them digit soup numbers
 -- an oblique reference to alphabet soup.
 

You grasp correctly what I was saying :-)

I'm not formally trained in mathematics so I often get the terminology
wrong or just don't know the accepted words for a concept. Lucky for me
though, English is a heavily overloaded language and there's always more
than one way to communicate something

-- 
Alan McKinnon
alan.mckin...@gmail.com




Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-02 Thread Peter Humphrey
On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote:

 English is a heavily overloaded language and there's always more
 than one way to communicate something

Even the simplest cases usually have three words for the same thing: one from 
French, one from Latin and one from Anglo-Saxon. I won't even mention words 
that have come down from Old German and so on, but at least we don't have 
many words from Italian or Spanish. (Zucchini? What's that?)

-- 
Rgds
Peter




[gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread James
 meino.cramer at gmx.de writes:


  I have a lot of files with digits of PI. The digits
  are the characters of 0-9. Currently they are ZIPped,
  which I think is not the best way to do that.

Hello Meino,

It's a bit of effort, but the world's recognized authority
on algorithms is Don Knuth. [1] He's old now, but his
pioneering attempt at categorizing most algorithms:
The art of computer programming and his MMIX alogrithm
implementations (kinda like assembler) are certainly
part of many first-step research efforts on algorithms
and their implementations.

It's not a cookbook; more of a scholarly (high_brow) reference,
just to supplement all the good postings by your peers on gentoo user.

Alan may loan you his copy?
(ha ha ha)?



hth,
James

[1] http://www-cs-faculty.stanford.edu/~uno/







Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread Alan McKinnon
On 01/11/2014 19:15, James wrote:
  meino.cramer at gmx.de writes:
 
 
  I have a lot of files with digits of PI. The digits
  are the characters of 0-9. Currently they are ZIPped,
  which I think is not the best way to do that.
 
 Hello Meino,
 
 It's a bit of effort, but the world's recognized authority
 on algorithms is Don Knuth. [1] He's old now, but his
 pioneering attempt at categorizing most algorithms:
 The art of computer programming and his MMIX alogrithm
 implementations (kinda like assembler) are certainly
 part of many first-step research efforts on algorithms
 and their implementations.
 
 It's not a cookbook; more of a scholarly (high_brow) reference,
 just to supplement all the good postings by your peers on gentoo user.
 
 Alan may loan you his copy?
 (ha ha ha)?
 
 
 
 hth,
 James
 
 [1] http://www-cs-faculty.stanford.edu/~uno/


ha ha, fat chance :-)

When Alan does eventually get his hands on his very own personal
copy[1], it will be lent to nobody. There are just some things a man
never lends out: his bike, his firearm, his wife. And Knuth :-)

Back on topic: You're 100% right - to learn about algorithms in general,
Knuth is the man. Essential reading for anyone taking CS seriously

-- 
Alan McKinnon
alan.mckin...@gmail.com




Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread meino . cramer
James wirel...@tampabay.rr.com [14-11-01 18:16]:
  meino.cramer at gmx.de writes:
 
 
   I have a lot of files with digits of PI. The digits
   are the characters of 0-9. Currently they are ZIPped,
   which I think is not the best way to do that.
 
 Hello Meino,
 
 It's a bit of effort, but the world's recognized authority
 on algorithms is Don Knuth. [1] He's old now, but his
 pioneering attempt at categorizing most algorithms:
 The art of computer programming and his MMIX alogrithm
 implementations (kinda like assembler) are certainly
 part of many first-step research efforts on algorithms
 and their implementations.
 
 It's not a cookbook; more of a scholarly (high_brow) reference,
 just to supplement all the good postings by your peers on gentoo user.
 
 Alan may loan you his copy?
 (ha ha ha)?
 
 
 
 hth,
 James
 
 [1] http://www-cs-faculty.stanford.edu/~uno/
 

Hello james,

Don Knuth ... oh YES! :)
For a long time I am using and prefering TeX over anything else
(ok...for ASCII I use vim... ;).

And beside his computer wisdom I also like his kind of humor a lot...
for example this one:
https://www.youtube.com/watch?v=eKaI78K_rgAlist=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8index=10

But my initial question was more targeted to practical computing as
to groundshakeing and fundamental research topics.

More like what tool to pick?...

I did some compression tests myself and currently I have this:
From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html)
I got zipped package of
1000 million places of PI each (~57MB for one ZIP).

I unpacked the first package and recompressed it with different
methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest
compression mode (-9). When a files name matches /.*ultra.*/, I used
the highest compression mode (-mx=9), else I only set the compression
method and leave the rest untouched (defaults).


 11996 2014-10-31 16:44 pi-0001.txt
  57105419 2014-10-31 16:47 pi-0001.txt.gz
  52632832 2014-10-31 16:48 pi-0001.txt.bz2
  52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z
  57110291 2014-10-31 17:23 pi-0001.zip
  51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z
  51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z
  52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z
  51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z

7zip's lzma wins here, which is also the default method of 7zip. I set
the ultra mode for this by hand.

From other sites which offer PI for download I know of methods, which
store the ASCII-digits in binary and compresses then. Would be
interesting, whether this creates a more handy input from 7zips
point of view...

Ah! By the way...I was astonished to read, that the digits of PI are
called random on the one hand and on the other hand there is a formula [1] 
to calculate a certain digit of PI without calculation of the previous
digits...
Calculated random? Are nature constants the purest form of PRNGs ??? ;)
(Quantum physics is everywhere... ;;))

[1]: http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula

Best regards,
Meino










Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread Matti Nykyri
 On Nov 1, 2014, at 19:26, Alan McKinnon alan.mckin...@gmail.com wrote:
 
 On 01/11/2014 19:15, James wrote:
 meino.cramer at gmx.de writes:
 
 
 I have a lot of files with digits of PI. The digits
 are the characters of 0-9. Currently they are ZIPped,
 which I think is not the best way to do that.
 
 Hello Meino,
 
 It's a bit of effort, but the world's recognized authority
 on algorithms is Don Knuth. [1] He's old now, but his
 pioneering attempt at categorizing most algorithms:
 The art of computer programming and his MMIX alogrithm
 implementations (kinda like assembler) are certainly
 part of many first-step research efforts on algorithms
 and their implementations.
 
 It's not a cookbook; more of a scholarly (high_brow) reference,
 just to supplement all the good postings by your peers on gentoo user.
 
 Alan may loan you his copy?
 (ha ha ha)?
 
 
 
 hth,
 James
 
 [1] http://www-cs-faculty.stanford.edu/~uno/
 
 
 ha ha, fat chance :-)
 
 When Alan does eventually get his hands on his very own personal
 copy[1], it will be lent to nobody. There are just some things a man
 never lends out: his bike, his firearm, his wife. And Knuth :-)

Why not lend your wife? ;)

 Back on topic: You're 100% right - to learn about algorithms in general,
 Knuth is the man. Essential reading for anyone taking CS seriously
 
 -- 
 Alan McKinnon
 alan.mckin...@gmail.com
 
 



Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread Alan McKinnon
On 01/11/2014 19:59, meino.cra...@gmx.de wrote:
 James wirel...@tampabay.rr.com [14-11-01 18:16]:
  meino.cramer at gmx.de writes:


  I have a lot of files with digits of PI. The digits
  are the characters of 0-9. Currently they are ZIPped,
  which I think is not the best way to do that.

 Hello Meino,

 It's a bit of effort, but the world's recognized authority
 on algorithms is Don Knuth. [1] He's old now, but his
 pioneering attempt at categorizing most algorithms:
 The art of computer programming and his MMIX alogrithm
 implementations (kinda like assembler) are certainly
 part of many first-step research efforts on algorithms
 and their implementations.

 It's not a cookbook; more of a scholarly (high_brow) reference,
 just to supplement all the good postings by your peers on gentoo user.

 Alan may loan you his copy?
 (ha ha ha)?



 hth,
 James

 [1] http://www-cs-faculty.stanford.edu/~uno/

 
 Hello james,
 
 Don Knuth ... oh YES! :)
 For a long time I am using and prefering TeX over anything else
 (ok...for ASCII I use vim... ;).
 
 And beside his computer wisdom I also like his kind of humor a lot...
 for example this one:
 https://www.youtube.com/watch?v=eKaI78K_rgAlist=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8index=10
 
 But my initial question was more targeted to practical computing as
 to groundshakeing and fundamental research topics.
 
 More like what tool to pick?...
 
 I did some compression tests myself and currently I have this:
From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html)
 I got zipped package of
 1000 million places of PI each (~57MB for one ZIP).
 
 I unpacked the first package and recompressed it with different
 methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest
 compression mode (-9). When a files name matches /.*ultra.*/, I used
 the highest compression mode (-mx=9), else I only set the compression
 method and leave the rest untouched (defaults).
 
 
  11996 2014-10-31 16:44 pi-0001.txt
   57105419 2014-10-31 16:47 pi-0001.txt.gz
   52632832 2014-10-31 16:48 pi-0001.txt.bz2
   52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z
   57110291 2014-10-31 17:23 pi-0001.zip
   51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z
   51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z
   52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z
   51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z
 
 7zip's lzma wins here, which is also the default method of 7zip. I set
 the ultra mode for this by hand.
 
From other sites which offer PI for download I know of methods, which
 store the ASCII-digits in binary and compresses then. Would be
 interesting, whether this creates a more handy input from 7zips
 point of view...
 
 Ah! By the way...I was astonished to read, that the digits of PI are
 called random on the one hand and on the other hand there is a formula [1] 
 to calculate a certain digit of PI without calculation of the previous
 digits...
 Calculated random? Are nature constants the purest form of PRNGs ??? ;)
 (Quantum physics is everywhere... ;;))
 
 [1]: 
 http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula


The sequence of digits that make up pi are a random sequence - you can
analyze the order any way you want and you'll find no inherent pattern.
However, any given digit in the sequence is 100% predictable, as you
just showed :-)

Randomness has got to be the second most mind-boggling thing out there,
first being quantumness (that's not a waord, I just made it up. You you
should get the meaning OK from context ;-) )

-- 
Alan McKinnon
alan.mckin...@gmail.com




[gentoo-user] Re: OT Best way to compress files with digits

2014-11-01 Thread David W Noon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
(alan.mckin...@gmail.com) wrote about Re: [gentoo-user] Re: OT Best
way to compress files with digits (in 545546d3.3030...@gmail.com):

 On 01/11/2014 19:59, meino.cra...@gmx.de wrote:
[snip]
 Ah! By the way...I was astonished to read, that the digits of PI
 are called random on the one hand and on the other hand there is
 a formula [1] to calculate a certain digit of PI without
 calculation of the previous digits... Calculated random? Are
 nature constants the purest form of PRNGs ??? ;) (Quantum physics
 is everywhere... ;;))
 
 [1]:
 http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula

 
 
 The sequence of digits that make up pi are a random sequence - you
 can analyze the order any way you want and you'll find no inherent
 pattern.

Actually, the sequence of digits is most definitely *not* random.  If
the sequence of digits is written any other way then the value is not
Pi.  Hence the sequence is unique, not random.

I think what you are grasping for is that the frequency of distinct
digits tends to be uniform: 0's occur as often as 1's as often ... as
9's.  Note that the as often as operator is really approximate for
finite sub-sequences, but is asymptotically accurate.

Moreover, this is the same in any number base: the binary
representation has 0's occurring as often as 1's; the ternary
representation has 0's occurring as often as 1' and as often as 2's;
etc., etc.

Such numbers are called normal.  It was a poor choice of name, but
we are stuck with it.  I would have called them digit soup numbers
- -- an oblique reference to alphabet soup.

 However, any given digit in the sequence is 100% predictable, as
 you just showed :-)
 
 Randomness has got to be the second most mind-boggling thing out
 there, first being quantumness (that's not a waord, I just made it
 up. You you should get the meaning OK from context ;-) )

I would say that probability theory is more mind boggling, as it
underpins much of quantum theory.  But, as someone who majored in
probability theory, I might be biased. [Incidentally, there is a small
statistical joke in that last sentence.]

Getting back to Meino's original request, one of the optimum
compression algorithms for this would be custom Huffman encoding.  To
do this the algorithm requires that all the data (i.e. digits) be read
and a frequency table built.  The only problem is that to read all the
digits of Pi could take rather a long time. ... :-)
- -- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwn...@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+
/xwAoK1qMgb9RZXkQByBUMqB8eqs20bG
=XUPB
-END PGP SIGNATURE-



[gentoo-user] Re: OT Best way to compress files with digits

2014-10-31 Thread Grant Edwards
On 2014-10-31, Rich Freeman ri...@gentoo.org wrote:
 On Fri, Oct 31, 2014 at 2:55 PM, David Haller gen...@dhaller.de wrote:

 On Fri, 31 Oct 2014, Rich Freeman wrote:

I can't imagine that any tool will do much better than something like
lzo, gzip, xz, etc.  You'll definitely benefit from compression though
- your text files full of digits are encoding 3.3 bits of information
in an 8-bit ascii character and even if the order of digits in pi can
be treated as purely random just about any compression algorithm is
going to get pretty close to that 3.3 bits per digit figure.

 Good estimate:

 $ calc '101000/(8/3.3)'
 41662.5
 and I get from (lzip)
 $ calc 44543*8/101000
 3.528...(bits/digit)
 to zip:
 $ calc 49696*8/101000
 ~3.93   (bits/digit)

 Actually, I'm surprised how far off of this the various methods are.
 I was expecting SOME overhead, but not this much.

 A fairly quick algorithm would be to encode every possible set of 96
 digits into a 40 byte code (that is just a straight decimal-binary
 conversion).  Then read a word at a time and translate it.  This
 will only waste 0.011 bits per digit.

You're cheating.  The algorithm you tested will compress strings of
arbitrary 8-bit values.  The algorithm you proposed will only compress
strings of bytes where each byte can have only one of 10 values.

-- 
Grant Edwards   grant.b.edwardsYow! I want another
  at   RE-WRITE on my CEASAR
  gmail.comSALAD!!




Re: [gentoo-user] Re: OT Best way to compress files with digits

2014-10-31 Thread Rich Freeman
On Fri, Oct 31, 2014 at 4:25 PM, Grant Edwards
grant.b.edwa...@gmail.com wrote:

 You're cheating.  The algorithm you tested will compress strings of
 arbitrary 8-bit values.  The algorithm you proposed will only compress
 strings of bytes where each byte can have only one of 10 values.


Of course.  I wasn't expecting the general-purpose algorithm to do as
well.  In some sense, part of the information that is being encoded is
actually in the compression algorithm itself (the mapping), while in a
general-purpose compression algorithm that information has to be part
of the compressed data stream.

I was just expecting gzip/etc to get much closer to the theoretical
limit.  I figured that it might be a few percent higher, but I wasn't
expecting a 10+% difference.

--
Rich