[
https://issues.apache.org/jira/browse/PHOENIX-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid resolved PHOENIX-340.
----------------------------------
Resolution: Fixed
Bulk resolve of closed issues imported from GitHub. This status was reached by
first re-opening all closed imported issues and then resolving them in bulk.
> Support atomic increment
> ------------------------
>
> Key: PHOENIX-340
> URL: https://issues.apache.org/jira/browse/PHOENIX-340
> Project: Phoenix
> Issue Type: Task
> Reporter: Raymond Liu
>
> At present, If you want to update a specific column and add an increment on
> itself, you can do that by
> " UPSERT INTO T1 (id, count) SELECT id, count+1 FROM T1 WHERE id = id1 "
> There are several problems here:
> 1. If id = id1 is not there, it won't be added with base value say the
> increment 1.
> 2. It is do not support concurrent update well, multiple thread running it at
> the same time will lead to error result.
> There are Htable.increment in HBASE which do support atomic increment, the
> problem is how to surface it to Phoenix.
> There are several way to do this job.
> Per -18 : implement "create sequence", while this only work for global
> counter usage, and not suitable for embeded in each row for e.g. page visit,
> link count etc.
> Make UPSERT SELECT support atomic operation. This is the ideal solution,
> while might involve too much overhead for normal operation which without
> atomic requirements. And Hbase only support LONG type for increment. So this
> won't work for common data type therefore should be limited on the scope.
> Though we can start to invent a new DML , still for easy idea showcase,
> UPSERT is the most close thing that I can reuse, Thus I have the following
> tweak on exisiting UPSERT (by adding a INCREASE before VALUES to include the
> increment feature) e.g.
> UPSERT INTO TEST(ID, COUNT) INCREASE VALUES('foo',1);
> That could reuse most of the UPSERT VALUES code path, and do not introduce
> too much extra overhead. When you have INCREASE in the Statement, the value
> for PRIMARY KEY still works as normal value for seeking the row, while the
> value for Non primary key will acting as increments.
> I have just made a initial version at here : SHA: 2466ee6 with unit test code
> for your reference on the usage and issues I mention below.
> Due to the limitation of current phoenix code structure and framework, there
> are a few problems in this initial version: SHA:
> 2466ee6a27d12b6c6bb29ba87ece95466e9df98a
> 1 Phoenix treat long/int etc. data type differently with HBASE, say, flip the
> sign bit. this will lead to incompatible operation on the same value when use
> hbase ICV to set initial value upon non exist column.
> 2 UNSIGNED LONG could be used without this initial value problem, however
> negative value will not be supported. Not only you can not store negative
> value in the column, but also you can not pass negative value to the UPSERT
> INCREASE VALUES statement, it won't pass the grammar checking.
> For this two issue, even you don't solve it for this issue, as long as you
> want to use increment ( say to implement create sequence ) you had to find a
> way to overcome it. To make the data type compatible with Hbase. Thus I am
> wondering, Maybe we could create two type for each of the number data type,
> say a RAW Version which do not flip the flag, and a flip flag version for use
> in the PK Column. They could still share one TYPE in DML, say LONG, but when
> DDL is called, it will change it to the corresponding TYPE and use it in the
> META TABLE. In this way, user will not need to know the difference. And the
> code could still deal with them easily without extra logic, maybe even
> faster, since the normal column's value do not need to go through
> encoding/decoding to flip flag anymore.
> 3 The current Mutation plan only accept PUT/DELETE and implement it by
> htable.batch. While Hbase increment go through htable.increment. The
> mutation join strategy also just works for simple replacement.
> To overcome this, it do need to hack a lot of fundamental code. so, in my
> branch, I do enhance the MutationState by change the mutation value from
> Byte[] to a MutationValue class to store both byte[] for PUT/DELETE and long
> for increment operation. And with join operation for multiple DML, the later
> Put/DELETE will override previous mutation, while a later Increment will not
> override PUT/DELETE, it will be kept. And will also accumulate on Increment
> on the same column. And upon commit, all the PUT/DELETE will still be batched
> first, then Increment will be done one by one.
> I am not sure is there better solution on this, but this approaching is the
> most easy one I can figure out which do not impact the whole framework too
> much.
> You can test out both of the scenario I mentioned above with the unit test
> case.
> At present, since issue 1/2 is not addressed well, some case will fail ( and
> so I comment them out). But with the solution for data type I mentioned above
> been implemented, I believe this could work quite well.
> Any idea?
--
This message was sent by Atlassian JIRA
(v6.2#6252)