[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
----------------------------------
    Attachment: PHOENIX-1954-wip5-rebased.patch

[~jamestaylor] Following on from our discussion I am adding a new patch that
1) Removes the need to cache startValues from Bulk Allocations on the client
2) Changes the flow such that Sequences are only validated when the expression 
is executed and the allocation happens now when the client reads the result of 
the expression via rs.next()

One small implication of this change to call out is that previously if 
executing SELECT NEXT <n> VALUES FOR <seq> resulted in a SQL Exception, for 
example if we attempted to invoke it on a sequence with the CYCLE flag set we 
would not throw out the currently cached batch of sequences on the client. With 
this change we do throw those out as after the expression runs, the client 
state has been updated to reflect that all have been used. I think this is fine 
and reflected in my tests, but I wanted to note it here.

Finally, on the client side there are now only 2 places where I was not able to 
remove special handling Bulk Allocation mode.
1) We need to explicitly check whether we are in bulk allocation mode and 
immediately throw a EMPTY_SEQUENCE_CACHE_EXCEPTION in Sequence.incrementValue() 
as when a request for a bulk allocation is made we cannot reply on currentValue 
and nextValue being equal to trigger this.

{code}
    public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
        SequenceValue value = findSequenceValue(timestamp);
        if (value == null || SequenceUtil.isBulkAllocation(numToAllocate)) {
            throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
        }
        
        if (value.currentValue == value.nextValue) {
            if (op == ValueOp.VALIDATE_SEQUENCE) {
                return value.currentValue;
            }
            throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
        }
        return increment(value, op, numToAllocate);
    }
{code}

2) When we calculate the current value we can't just take the max of the 
numToAllocate and the cache size. There is a corner case where a client may 
request a bulk allocation and request a number of slots that is less than the 
cache size. We handle this in this way:

{code}
            if (op != ValueOp.VALIDATE_SEQUENCE) {
                // We can't just take the max of numToAllocate and cacheSize
                // We need to handle a valid edgecase where a client requests 
bulk allocation of 
                // a number of slots that are less than cache size of the 
sequence
                currentValue -= incrementBy * 
(SequenceUtil.isBulkAllocation(numToAllocate) ? numToAllocate : cacheSize);
            }
{code}

> Reserve chunks of numbers for a sequence
> ----------------------------------------
>
>                 Key: PHOENIX-1954
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Jan Fernando
>         Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
> PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
> PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch
>
>
> In order to be able to generate many ids in bulk (for example in map reduce 
> jobs) we need a way to generate or reserve large sets of ids. We also need to 
> mix ids reserved with incrementally generated ids from other clients. 
> For this we need to atomically increment the sequence and return the value it 
> had when the increment happened.
> If we're OK to throw the current cached set of values away we can do
> {{NEXT VALUE FOR <seq>(,<N>)}}, that needs to increment value and return the 
> value it incremented from (i.e. it has to throw the current cache away, and 
> return the next value it found at the server).
> Or we can invent a new syntax {{RESERVE VALUES FOR <seq>, <N>}} that does the 
> same, but does not invalidate the cache.
> Note that in either case we won't retrieve the reserved set of values via 
> {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
> need to guarantee is that after a call to {{RESERVE VALUES FOR <seq>, <N>}}, 
> which returns a value <M> is that the range [M, M+N) won't be used by any 
> other user of the sequence. My might need reserve 1bn ids this way ahead of a 
> map reduce run.
> Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to