[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538672#comment-14538672
 ] 

Jesse Yates commented on PHOENIX-1954:
--------------------------------------

In the original discussion that came up, we came up with the same syntax, but 
with the follow problem. What if the client first gets a sequence (which 
batches by 100), so they reserve sequence {{0-99}} and get the value 0. Then, 
to reserve a sequence they use {{NEXT 1000 VALUE for seq}}, which bumps the 
external next id to {{1100}. Then when they next do {{NEXT VALUE FOR seq}} what 
should the next value be? 

There are a couple possible solutions:
* They get value 1. Then if they call it 99 more times, they would get 
2,3,...99, 1100. Which skips the reserved sequence. This is however a bit odd 
and why Lars proposed the different syntax, so the client is aware that the 
next sequence is unmanaged
* The get value 1100. This would 'throw away' the client cache of {{0-99}} and 
just get the next logical element of the sequence. Simpler and reserves the 
number space
* They get 1, followed by 2,3,...99,100, 101,...1099. However, this would 
conflict with the idea of a 'reserved' space which is allocated as needed from 
the client's perspective.

The reserved ID space is somewhat separate from the client's standard sequence 
logic, but in many cases, needs to interroperate in the same sequence. For 
instance, batch generating UUIDs (reserving an appropriately sized block) 
interleaving with stream/on-demand generation of UUIDs.

{{ALLOCATE}} differentiates the above cases since it somewhat decouples the 
client's two usages.

> Reserve chunks of numbers for a sequence
> ----------------------------------------
>
>                 Key: PHOENIX-1954
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> In order to be able to generate many ids in bulk (for example in map reduce 
> jobs) we need a way to generate or reserve large sets of ids. We also need to 
> mix ids reserved with incrementally generated ids from other clients. 
> For this we need to atomically increment the sequence and return the value it 
> had when the increment happened.
> If we're OK to throw the current cached set of values away we can do
> {{NEXT VALUE FOR <seq>(,<N>)}}, that needs to increment value and return the 
> value it incremented from (i.e. it has to throw the current cache away, and 
> return the next value it found at the server).
> Or we can invent a new syntax {{RESERVE VALUES FOR <seq>, <N>}} that does the 
> same, but does not invalidate the cache.
> Note that in either case we won't retrieve the reserved set of values via 
> {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
> need to guarantee is that after a call to {{RESERVE VALUES FOR <seq>, <N>}}, 
> which returns a value <M> is that the range [M, M+N) won't be used by any 
> other user of the sequence. My might need reserve 1bn ids this way ahead of a 
> map reduce run.
> Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to