Re: Huffman coding comparison

bearophile Mon, 31 May 2010 05:05:19 -0700

Andrei:

>Thanks for your notes.<


My pleasure, if you like them I will try to do this thing some more times.
There are probably tens or hundreds of small details that can be improved in 
Phobos. Some of such changes can improve the usage patterns. In past I have put 
some of them in Bugzilla.
One good way to find such possible improvements is to use Phobos, to write 
small programs, and keep eyes open.


>Looks like we're having two proposals.<

I am sceptic that this can be done with no compiler/language support to offer 
the good enough syntax sugar:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=110613

In my dlibs1 (for D1) I have implemented and then later sometimes used an 
expanded and improved an idea by Henning Hasemann shown in 2007, but this (you 
are free to use it if you want, code shown in this newsgroup page is free to 
use, I think):
- is not efficient
- you have to define the iteration variable types before this array comp
- the code that uses this array comp is not so easy to read
- this can't be done (in D1) for lazy comphrensions.


TA[] select(TA, TI, TC)(lazy TA mapper,
                        ref TI iter1, TC items1) {
    static if(is( TC == void[0] )) {
        return null;
    } else {
        auto aux1 = iter1; // save original iteration variable

        static if (HasLength!(TC)) {
            auto result = new TA[items1.length];

            uint i;
            static if (IsAA!(TC)) {
                foreach (k, v; items1) {
                    iter1 = k;
                    result[i] = mapper();
                    i++;
                }
            } else {
                // Then items1 is an iterable with attribute length
                // (an array, xrange, etc)
                // don't use foreach (i,el;items1), it's less general
                foreach (el; items1) {
                    iter1 = el;
                    result[i] = mapper();
                    i++;
                }
            }

            iter1 = aux1; // restore original iteration variable
            return result;
        } else {
            // Then items1 is an iterable object
            // when TA isn't an AA, geometric grow can be used to speed up
            ArrayBuilder!(TA) result;
            foreach (el; items1) {
                iter1 = el;
                result ~= mapper();
            }
            iter1 = aux1; // restore original iteration variable
            return result.toarray;
        }
    }
}


TA[] select(TA, TI, TC, TP)(lazy TA mapper,
                            ref TI iter1, TC items1,
                            lazy TP where) {
...

TA[] select(TA, TI1, TC1, TI2, TC2)(lazy TA mapper,
                                    ref TI1 iter1, TC1 items1,
                                    ref TI2 iter2, lazy TC2 items2) {
...

TA[] select(TA, TI1, TC1, TI2, TC2, TP)(lazy TA mapper,
                                        ref TI1 iter1, TC1 items1,
                                        ref TI2 iter2, lazy TC2 items2,
                                        lazy TP where) {
...



>There exists a pop() function that only pops one element.<

This is how it is implemented:

/**
Pops the largest element (according to the predicate $(D less)).
 */
    void pop()
    {
        enforce(_length);
        if (_length > 1)
        swap(_store.front, _store[_length - 1]);
        --_length;
        percolateDown(_store[0 .. _length]);
    }

I'd like it to also return the popped item, a ElementType!Range, is this 
possible?
Popping one item out is one of the most common operations I have to perform on 
an heap.


>>array(map(...)) is so common that an amap(...) can be considered.<<

>I don't know.<

A too much long list of function (that is a too much large API) is bad, but I 
have found that for the most common higher order functions (map and filter, 
they are common because array comps aren't present) I like a short cut for the 
eager version, amap/afilter. But they are not essential, we can survive without 
them :-)


>I'm not crazy about functions that return large arrays by value. I'd have 
>sorted() return a range (a heap!) that lazily spans the input in sorted order.<

When I need only the few items in sorted order I can use just pop(n), or many 
pop().
Functional languages return data copies, but they are sometimes lazy (Haskell) 
or thy try to avoid using arrays and use more functional-friendly data 
structures that reduce the need for copying lot of data.

sorted() and schwartzSorted() can be handy because they can be used as 
expressions in a functional style. You can write:

foreach (x; sorted(items)) {...

Instead of:

auto sortedItems = items.dup;
sortedItems.sorted();
foreach (x; sortedItems) {...

If items is long then sorted() is not the top efficiency in both memory and 
time, but in many situations you don't have many items. Most arrays are short. 
If you have a 5 items long array and the items are small (like numbers) then 
using sorted() is not so bad unless it's in the middle of a critical piece of 
code. And in this case using a standard sort is probably wrong anyway.

So sorted/schwartzSorted are not for every situation, they are more for 
situations where you prefer handy and short code, and you don't need max 
performance. You don't have to abuse them, as most other things.


>I've never had a clear view on what the target audience for writeln() is. You 
>seem to want it to output debug strings; I'm not sure that's the best way to 
>purpose it.<

Usages of the printing functions:
- To debug code. For this purpose the text shown has to be human-readable, the 
writeln has to be as automatic as possible (to reduce time needed to add the 
printing statements), and the text shown has to be "nice" to show the data 
types but not too much noisy, otherwise the text can become useless. There are 
more modern ways to show data structures, even GUI-based, but having a 
fall-back strategy with a good writeln is good.
- To show output in small script-like programs or medium command line programs. 
I think this is the same case as the debug code one.
- To print a lot of numbers or simple symbols, for later processing with other 
programs. In this case printf() is better because it's faster than writeln.
- To print many strings. In this case in D printf()/puts() can be suboptimal or 
unfit. Some very simple, very fast and not templated function similar to puts() 
but designed for D strings.
- For (textual) serialization? In this case it's better to use functions more 
specialized for this purpose, and to avoid the writeln.


So I don't see why it's better for this command:
  writeln(tuple(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]]));

To print:
  Tuple!(double[char],string,int[][])([x:1, y:2], hello, 1 2 3 4)

Instead of something more fitter for humans, that can show the things well, as:
  tuple(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]])
Or:
  Tuple!(double[char], string, int[][])(['x': 1.0, 'y': 2.0], "hello", [[1, 2], 
[3, 4]])

Bye,
bearophile

Re: Huffman coding comparison

Reply via email to