***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***


A while ago, I had the idea of defining a crystallographic test test with a simple function that produces the same test flag for a given HKL, independent of how they are assigned. An integer hash function can do this.

A 32-bit integer hash is enough for most cases, but I decided that a 64-bit hash is better. This allows for 3 16-bit HKL indices, plus one 'seed' value. The test assignment procedure is to pack {seed,H,K,L} as a 64-bit integer, and pass this through a 64-bit hash. If the hash function is good, the result is a 64-bit number that appears random, but is precisely defined for a given input value. We convert this to a floating-point number in the range 0-1, and convert that into a test-set value depending on the percentage choice.

The result is a well defined test array, based on the seed value and test fraction, that is trivial to reproduce, and in fact never has to be written to a reflection file. This makes it easy to reliably maintain the same test set for multiple data sets.

In the case of significant NCS-related reflections, the thin-shell selection method can also be written as an equation rather than as an array of values. So, in both cases, it should be possible to utilize Free-R sets by definition rather than writing out values.

Here is an example, using "Thomas Wang's 64-bit Mix" hash function. It appears to work well.

Joe Krahn
------------------------------------------------------------------------

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

// Thomas Wang's 64 bit Mix Hash Function
uint64_t hash64(uint64_t key) {
  key += ~(key << 32);
  key ^= (key >> 22);
  key += ~(key << 13);
  key ^= (key >> 8);
  key += (key << 3);
  key ^= (key >> 15);
  key += ~(key << 27);
  key ^= (key >> 31);
  return key;
}

double hash_hkl( uint16_t seed, int16_t h, int16_t k, int16_t l){
  uint64_t n;
  n = hash64( ((uint64_t)seed) << 48 | (((uint64_t)h) << 32)
      | (((uint64_t)k) << 16) | (((uint64_t)l)) );

  // Divide n by the max-value of a 64-bit unsigned int plus one.
  return (double)n / ( ((double) ~((uint64_t)0)) + 1.0);
}

Reply via email to