I'm looking at the dmd source now...

The compression is done in the backend, file cgobj.c

The conditions are:


#define LIBIDMAX 128
    if (len > LIBIDMAX)
    {
        // Attempt to compress the name
        name2 = id_compress(name, len);
 // snip
        if (len2 > LIBIDMAX)            // still too long
        {
            /* Form md5 digest of the name and store it in the
             * last 32 bytes of the name.
             */

// snip impl, open the source to see specific details




/******************************************
 * Compress an identifier.
 * Format: if ASCII, then it's just the char
 *      if high bit set, then it's a length/offset pair
 * Returns:
 *      malloc'd compressed identifier
 */

char *id_compress(char *id, int idlen)
{



The implementation, same source file, looks like it compresses by looking for longest duplicate strings and then removes them, using the offset instead.



The reason I snipped the implementations here is the backend is under a more restrictive license so I don't want to get into copying that. But with just what I've said here combined with guess+check against dmd's output it might be enough to do a clean room implementation.



Or if Walter can give us permission to copy/paste this into a D file we could use id directly.

Reply via email to