Right now the implementation will return an indeterminate version when
there are duplicates with the same timestamp. If they happen to have
the same value, you are ok.

I think there are a few other gotchas with regards to compaction.
Each timestamp only counts as 1 version, thus you may end up with more
data than you intended depending on how many duplicates you have.

On Tue, Nov 24, 2009 at 2:35 PM, Zhenyu Zhong <zhongresea...@gmail.com> wrote:
> Hi,
>
> I would like to use nice feature of HBase --versions to store a timeseries
> data for a rowkey.
> However, I get duplicates for the same rowkey and the same timestamp if I
> use Put and run mapreduce job multiple times.
>
> For example.
> Put put = new Put(rowkey.getBytes());
> put.add("f1:c1".getBytes(), ts, value.getBytes());
>
> I use TableOutputFormat as the output of the MapReduce job.
> If I run the MapReduce job twice, I would get 2 records with the same rowkey
> and same timestamp.
>
> May I ask whether the Put just adds a row no matter that there is already a
> row with the same key and timestamp in the table?
>
> Best,
> zhenyu
>

Reply via email to