[ 
https://issues.apache.org/jira/browse/HBASE-11788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105839#comment-14105839
 ] 

Cristian Armaselu commented on HBASE-11788:
-------------------------------------------

Consider 2 JVM executing the following on the same row key
JVM1
- (1) Put to update new "columns", 
- (2) Delete to remove "column" X value (since incoming data is null for column 
X)

JVM2
- (1) Put to update new "columns", 
- (2) Delete to remove "column" Y value (since incoming data is null for column 
X)

What happens if the hbase API requests are applied in this order (since 2 JVM, 
2 threads of execution)
JVM1 (1) Put
JVM2 (2) Delete
JVM1 dies
JVM2 (2) Delete
Or you can think of any other combination with 4 operations instead of 2

The data is corrupted, none of the 2 new records were applied, we don't have 
the previous record stored in hbase.
I cannot point in Hive where the issue is. I see the behavior in the queries 
executed against it.
We were using Put with DeleteColumn for null incoming cells and the "is 
null"/"is not null" operators in hive were running fine.
With the new Hbase 0.96.1.1 data is coming back from the query when it should 
not come back since the incoming column is null, but hbase decided to put an 
empty cell




> hbase is not deleting the cell when a Put with a KeyValue, 
> KeyValue.Type.Delete is submitted
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11788
>                 URL: https://issues.apache.org/jira/browse/HBASE-11788
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0, 0.96.1.1, 0.98.5, 2.0.0
>         Environment: Cloudera CDH 5.1.x
>            Reporter: Cristian Armaselu
>         Attachments: TestPutWithDelete.java
>
>
> Code executed:
> {code}
>     @Test
>     public void testHbasePutDeleteCell() throws Exception {
>         TableName tableName = TableName.valueOf("my_test");
>         Configuration configuration = HBaseConfiguration.create();
>         HTableInterface table = new HTable(configuration, tableName);
>         final String rowKey = "12345";
>         final byte[] familly = Bytes.toBytes("default");
>         // put one row
>         Put put = new Put(Bytes.toBytes(rowKey));
>         put.add(familly, Bytes.toBytes("A"), Bytes.toBytes("a"));
>         put.add(familly, Bytes.toBytes("B"), Bytes.toBytes("b"));
>         put.add(familly, Bytes.toBytes("C"), Bytes.toBytes("c"));
>         table.put(put);
>         // get row back and assert the values
>         Get get = new Get(Bytes.toBytes(rowKey));
>         Result result = table.get(get);
>         Assert.isTrue(Bytes.toString(result.getValue(familly, 
> Bytes.toBytes("A"))).equals("a"), "Column A value should be a");
>         Assert.isTrue(Bytes.toString(result.getValue(familly, 
> Bytes.toBytes("B"))).equals("b"), "Column B value should be b");
>         Assert.isTrue(Bytes.toString(result.getValue(familly, 
> Bytes.toBytes("C"))).equals("c"), "Column C value should be c");
>         // put the same row again with C column deleted
>         put = new Put(Bytes.toBytes(rowKey));
>         put.add(familly, Bytes.toBytes("A"), Bytes.toBytes("a"));
>         put.add(familly, Bytes.toBytes("B"), Bytes.toBytes("b"));
>         put.add(new KeyValue(Bytes.toBytes(rowKey), familly, 
> Bytes.toBytes("C"), HConstants.LATEST_TIMESTAMP, KeyValue.Type.DeleteColumn));
>         table.put(put);
>         // get row back and assert the values
>         get = new Get(Bytes.toBytes(rowKey));
>         result = table.get(get);
>         Assert.isTrue(Bytes.toString(result.getValue(familly, 
> Bytes.toBytes("A"))).equals("a"), "Column A value should be a");
>         Assert.isTrue(Bytes.toString(result.getValue(familly, 
> Bytes.toBytes("B"))).equals("b"), "Column A value should be b");
>         Assert.isTrue(result.getValue(familly, Bytes.toBytes("C")) == null, 
> "Column C should not exists");
>     }
> {code}
> This assertion fails, the cell is not deleted but rather the value is empty:
> {code}
> hbase(main):029:0> scan 'my_test'
> ROW                                                   COLUMN+CELL             
>                                                                               
>                                                       
>  12345                                                column=default:A, 
> timestamp=1408473082290, value=a                                              
>                                                             
>  12345                                                column=default:B, 
> timestamp=1408473082290, value=b                                              
>                                                             
>  12345                                                column=default:C, 
> timestamp=1408473082290, value=      
> {code}
> This behavior is different than previous 4.8.x Cloudera version and is 
> currently corrupting all hive queries involving is null or is not null 
> operators on the columns mapped to hbase



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to