Hi,
I am starting to use following scheme for primary keys: SHA256(URL) + "-RAW" Primary Key Schema <https://outsideiq.jira.com/browse/CA-107> RATIONALE: * PKs in Lily (user-defined) will be prepended "USER." and I can't use URI for instance (it contains dots which is special character in current version) * Additionally to SHA-256-generated PK, Lily will still use UUID (which is really unique) for versioning * IMPORTANT: we need randomize Pks; it is best practice with Hbase (data will be randomly distributed in a cluster) and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is postfix so that we will have good "randomization"; in Hbase, all data is physically sorted by PK) - since all OIQ objects will be stored denormalized as JSON (string type Lily) (note, it will be UTF-8 encoded, I believe it is also part of ECMA-specs) /** * {@link http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-of -uri-as-the-primary-key-in-a-database} * * @author Fuad * */ public class SHA256 { public static final String SHA256(byte[] bytes) throws NoSuchAlgorithmException { MessageDigest md = MessageDigest.getInstance("SHA-256"); md.update(bytes); byte[] mdbytes = md.digest(); // convert the byte to hex format StringBuffer hexString = new StringBuffer(); for (int i = 0; i < mdbytes.length; i++) { String hex = Integer.toHexString(0xff & mdbytes[i]); if (hex.length() == 1) hexString.append('0'); hexString.append(hex); } return hexString.toString(); } public static final String SHA256(String text) throws NoSuchAlgorithmException, UnsupportedEncodingException { return SHA256(text.getBytes("UTF-8")); } } -- Fuad Efendi
