[ 
https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiang Li updated HBASE-18555:
-----------------------------
    Description: 
In Put#addColumn() and addImmutable(), after getting the cell list of the given 
family and add the cell into the list, the code puts (key=family, value=list) 
into familyMap.

In addColumn(), it is like
{code}
    List<Cell> list = getCellList(family);
    KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
    list.add(kv);
    familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
    return this;
{code}

In addImmutable(), it is like
{code}
    List<Cell> list = getCellList(family);
    KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
    list.add(kv);
    familyMap.put(family, list); // <-- here
    return this;
{code}

I think those put() for Map only take effect when getCellList(family) returns a 
new allocated ArrayList. When the list for a family already exist, put() for 
Map will update the value to the reference of the list, but actually, the 
reference of the list is not changed.

Those put() do not do any harm in terms of the correctness when they are here. 
But it could be removed to improve the performance. familyMap searches for key 
and set the new value and return the old value. Those operation take some time 
but actually are not needed. 

The put() could be moved into Mutation#getCellList(family)

  was:
In Put#addColumn() and addImmutable(), after getting cell list of the given 
family and add the cell into the list, the code will put (key=family, 
value=list) into family map.

In addColumn(), it is like
{code}
    List<Cell> list = getCellList(family);
    KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
    list.add(kv);
    familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
    return this;
{code}

In addImmutable(), it is like
{code}
    List<Cell> list = getCellList(family);
    KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
    list.add(kv);
    familyMap.put(family, list); // <-- here
    return this;
{code}

I think put() for Map only take effect when getCellList(family) returns a new 
allocated ArrayList. When the list for a family already exist, put() for Map 
will update the value to the reference of the list, but actually, the reference 
of the list is not changed.

Those put() do not do any harm in terms of the correctness when they are here. 
But it could be removed to improve the performance, as familyMap is a TreeMap, 
in put(), it searches for key and set the new value and return the old value. 
Those operation take some time but actually are not needed. 

The put() could be moved into Mutation#getCellList(family)


> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> ------------------------------------------------------------------------
>
>                 Key: HBASE-18555
>                 URL: https://issues.apache.org/jira/browse/HBASE-18555
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the 
> given family and add the cell into the list, the code puts (key=family, 
> value=list) into familyMap.
> In addColumn(), it is like
> {code}
>     List<Cell> list = getCellList(family);
>     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
>     list.add(kv);
>     familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
>     return this;
> {code}
> In addImmutable(), it is like
> {code}
>     List<Cell> list = getCellList(family);
>     KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
>     list.add(kv);
>     familyMap.put(family, list); // <-- here
>     return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns 
> a new allocated ArrayList. When the list for a family already exist, put() 
> for Map will update the value to the reference of the list, but actually, the 
> reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are 
> here. But it could be removed to improve the performance. familyMap searches 
> for key and set the new value and return the old value. Those operation take 
> some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to