Chu, There is no uniqueness test performed when data is stored into a cell. If your schema allows multiple versions and you store the same data into the cell more than once at different times, you will get back in response to queries the "duplicates" such as you presented.
If you are trying to avoid duplicates, use a row key that uniquely identifies an object (such as a SHA-1 hash) and set MAX_VERSIONS on the column that should contain only one canonical entry to 1. Then if you store the same data item more than once, a replacement will happen instead of an addition. Hope this helps, - Andy > From: 鞠適存 <[EMAIL PROTECTED]> > Subject: data duplicate? > To: [email protected] > Date: Thursday, November 27, 2008, 7:31 PM > Hi, > > I revised the sample code "Bulk Import" written > by Allen Day to upload a > flat data file to a hbase table. > My table schema is designed as: <row key> > <ColFamily1:colKey> <ColFamily2: > colkey>. > The table description found by hbase shell is as follow: > {NAME => 'ATCGeo', IS_ROOT => > 'false', IS_META => 'false', FAMILIES > => > [{NAME => 'photo_id', BLOOMFILTER => 'f > alse', VERSIONS => '30000', COMPRESSION > => 'NONE', LENGTH => '2147483647', > TTL => '-1', IN_MEMORY => 'true', B > LOCKCACHE => 'true'}, {NAME => > 'trail_id', BLOOMFILTER => 'false', > VERSIONS > => '30000', COMPRESSION => 'NONE', > LENGTH => '2147483647', TTL => '-1', > IN_MEMORY => 'true', BLOCKCACHE => > 'true'}]} > > Some of the data was been found as duplicate-with the same > content but the > different timestamp. For example, > I use the: get '<table>', > '<rowkey>',{COLUMN=>'col1',VERSION=>30000} > the results are: > timestamp=3090896685592411, > value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg > > timestamp=3090896682597411, > value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg > > timestamp=3090731558521386, > value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg > > timestamp=3090731556503386, > value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg > > I am sure that the data in original file is unique. Could > anyone tell me what's the possible reasons? > Would appreciate any help! > > Chu
