Two things go wrong: 1. Current version of Lily has "dot" as a special/reserved character 2. We shouldn't overload the same Slave-A just because everything starting with "http://www.ABCD******" is stored on this Slave-A
We should randomize primary keys -Fuad On 11-08-03 11:50 AM, "Michael Buckley" <[email protected]> wrote: >What is wrong with using the url as PK? Is it space? Or query >performance? > >Michael > >On 2011-08-03, at 11:32 AM, Fuad Efendi wrote: > >> >> Such design is enforced for RAW: we need to keep history of HTMLs under >> the same ID value, that's why first candidate for ID is URL, and finally >> we use SHA(URL) >> >> For OIQ, it must be carefully planned. SHA(JSON) has benefit of implicit >> "equals" implementation (JSON objects are not the same if ID := >>SHA(JSON) >> is different) >> >> -Fuad >> >> >> >> >> >> >> On 11-08-03 10:25 AM, "Fuad Efendi" <[email protected]> wrote: >> >>> Hi, >>> >>> >>> I am starting to use following scheme for primary keys: >>> SHA256(URL) + "-RAW" Primary Key Schema >>> <https://outsideiq.jira.com/browse/CA-107> >>> >>> >>> >>> RATIONALE: >>> * PKs in Lily (user-defined) will be prepended "USER." and I can't use >>> URI >>> for instance (it contains dots which is special character in current >>> version) >>> * Additionally to SHA-256-generated PK, Lily will still use UUID >>>(which is >>> really unique) for versioning? >>> * IMPORTANT: we need randomize Pks; it is best practice with Hbase >>>(data >>> will be randomly distributed in a cluster) >>> >>> and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it >>>is >>> postfix so that we will have good "randomization"; in Hbase, all data >>>is >>> physically sorted by PK) >>> - since all OIQ objects will be stored denormalized as JSON (string >>>type >>> Lily) (note, it will be UTF-8 encoded, I believe it is also part of >>> ECMA-specs) >>> >>> >>> >>> >>> /** >>> >>> * {@link >>> >>>http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-has >>>h- >>> of >>> -uri-as-the-primary-key-in-a-database} >>> >>> * >>> >>> * @author Fuad >>> >>> * >>> >>> */ >>> >>> public class SHA256 { >>> >>> >>> >>> public static final String SHA256(byte[] bytes) throws >>> NoSuchAlgorithmException { >>> >>> MessageDigest md = MessageDigest.getInstance("SHA-256"); >>> >>> md.update(bytes); >>> >>> byte[] mdbytes = md.digest(); >>> >>> >>> >>> // convert the byte to hex format >>> >>> StringBuffer hexString = new StringBuffer(); >>> >>> for (int i = 0; i < mdbytes.length; i++) { >>> >>> String hex = Integer.toHexString(0xff & mdbytes[i]); >>> >>> if (hex.length() == 1) >>> >>> hexString.append('0'); >>> >>> hexString.append(hex); >>> >>> } >>> >>> >>> >>> return hexString.toString(); >>> >>> } >>> >>> >>> >>> >>> >>> public static final String SHA256(String text) throws >>> NoSuchAlgorithmException, UnsupportedEncodingException { >>> >>> return SHA256(text.getBytes("UTF-8")); >>> >>> } >>> >>> >>> >>> } >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Fuad Efendi >>> >>> >>> >>> >>> >> >> >
