[ 
https://issues.apache.org/jira/browse/HBASE-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-15214:
-----------------------------------
    Attachment: HBASE-15214.patch

> Valid mutate Ops fail with RPC Codec in use and region moves across
> -------------------------------------------------------------------
>
>                 Key: HBASE-15214
>                 URL: https://issues.apache.org/jira/browse/HBASE-15214
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.2.1, 1.1.4, 1.0.4, 0.98.18
>
>         Attachments: HBASE-15214.patch
>
>
> Test failures in HBASE-15198 lead to this bug. Till now we are not doing cell 
> block (codec usage) for write requests. (Client -> server)  Once we enabled 
> Codec usage by default, aw this issue.
> A multi request came to RS with mutation for different regions. One of the 
> region which was in this RS got unavailable now.  In RsRpcServices#multi, we 
> will fail that entire RegionAction (with N mutations in it) in that 
> MultiRequest.  Then we will continue with remaining RegionActions.  Those 
> Regions might be available.  (The failed RegionAction will get retried from 
> client after fetching latest region location).  This all works fine in pure 
> PB requests world. When a Codec is used, we wont convert the Mutation Cell to 
> PB Cells and pack them in PB Message. Instead we will pass all Cells 
> serialized into one byte[] cellblock. Using Decoder we will iterate over 
> these cells at server side. Each Mutation PB will know only the number of 
> cells associated with it.  As in above case when an entire RegionAction was 
> skipped, there might be N Mutations under that which might have corresponding 
> Cells in the cellblock. We are not doing the skip in that Iterator. This 
> makes the later Mutations (for other Regions) to refer to invalid Cells and 
> try to put those into the a different region. This will make 
> HRegion#checkRow() to throw WrongRegionException which will be treated as 
> Sanity check failure and so throwing back a DNRIOE to client. So the op will 
> get failed for the user code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to