Ryan, Thanks for clearing that up! This is something that's showing up in our = =3D test suite. Since we are repeatedly creating and deleting a row. So, =3D=
it's not a problem for me to run a compaction, or use a different =3D row-id. So, it appears I also have the option of making sure my custom =3D timestamps are in "the future." Thanks for clearing that up. Much appreciated. And at 249am?!?! = Thanks. Kyle (@mudphone) > From: Ryan Rawson <[email protected]> > Date: February 2, 2010 2:49:53 AM PST > To: [email protected] > Subject: Re: Can't Put Row with Same ID Twice (if using custom = timestamp) > =20 > =20 > This is expected... > =20 > To understand why, we need to look at how deletes are handled in > HBase. Since files in HDFS are immutable, we don't actually go > through and remove data when you ask for a 'delete'. Instead we > insert a delete marker, at a given timestamp that says 'everything > older than this time is gone'. This delete marker (also known as > tombstones in other systems) is an explicit entry and does not go away > for a while (until the next major compaction). During reads, we use > the delete markers to suppress 'deleted data'. > =20 > When you insert a row with a timestamp that overlaps with a delete > marker like this, the effect is as you see. > =20 > One way to "fix" this is: > put > delete > major_compact 'table' > put > =20 > during a major compaction, we prune all delete records and their > suppressed data leaving a nice and clean file with no deleted data nor > markers. But normally major compaction is run at most 1x a day, since > on a larger cluster is is very heavy-weight - it must rewrite the > entire region of data! > =20 > Good luck! > -ryan > =20 > On Tue, Feb 2, 2010 at 2:29 AM, Kyle Oba <[email protected]> wrote: >> Hi, >> =20 >> I seem to be able to write the a row, delete it, then write it again, = if I use custom version timestamps. >> =20 >> As you can see from the HBase shell session below, I am: >> =20 >> 1) creating a row with id =3D "r1" and custom version timestamp >> 2) deleteall from table >> 3) attempt to put another row with id =3D "r1", also with custom = version timestamp >> 4) successfully able to create another row, with different row id =3D = "r2" >> =20 >> I should note that if I do NOT specify a custom timestamp, this = problem does not seem to show up. >> =20 >> Perhaps I'm misusing the version timestamp api? >> =20 >> Kyle >> =20 >> =20 >> 1) >> hbase(main):028:0> put "capjure_test", "r1", "meta", "v1", 123 >> 0 row(s) in 0.0030 seconds >> hbase(main):029:0> scan "capjure_test" >> ROW COLUMN+CELL >> r1 column=3Dmeta:, timestamp=3D123, = value=3Dv1 >> 1 row(s) in 0.0060 seconds >> =20 >> =20 >> 2) >> hbase(main):030:0> deleteall "capjure_test", "r1" >> 0 row(s) in 0.0020 seconds >> =20 >> =20 >> 3) >> hbase(main):031:0> put "capjure_test", "r1", "meta", "v1", 124 >> 0 row(s) in 0.0050 seconds >> hbase(main):032:0> scan "capjure_test" >> ROW COLUMN+CELL >> 0 row(s) in 0.0030 seconds >> hbase(main):033:0> flush "capjure_test" >> 0 row(s) in 0.0900 seconds >> hbase(main):034:0> scan "capjure_test" >> ROW COLUMN+CELL >> 0 row(s) in 0.0070 seconds >> =20 >> =20 >> 4) >> hbase(main):037:0> put "capjure_test", "r2", "meta", "v1", 124 >> 0 row(s) in 0.0030 seconds >> hbase(main):038:0> scan "capjure_test" >> ROW COLUMN+CELL >> r2 column=3Dmeta:, timestamp=3D124, = value=3Dv1 >> 1 row(s) in 0.0070 seconds >> =20
