[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621116#comment-14621116
 ] 

Jan Fernando commented on PHOENIX-1954:
---------------------------------------

[~jamestaylor] Re (1) This is turning out to be a bit more complex that it 
seems on the surface. The complexity comes from overflows and underflows and 
cycles for standard sequences. Here's what I have so far, but this doesn't work 
on sequences that over/underflow with cycles enabled (tests caught this). I'm 
still looking for a solution for this - the issue stems from the fact that when 
we overflow we the old check of value.currentValue == value.nextValue would 
return false when currentValue was at LONG.MAX as nextValue would have 
overflown, now it returns false.

{code}
    public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
        SequenceValue value = findSequenceValue(timestamp);
        if (value == null) {
            throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
        }
        
        if (isSequenceCacheExhausted(numToAllocate, value)) {
            if (op == ValueOp.VALIDATE_SEQUENCE) {
                return value.currentValue;
            }
            throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
        }
        return increment(value, op, numToAllocate);
    }

    /**
     * This method checks whether there are sufficient values in the 
SequenceValue
     * cached on the client to allocate the requested number of slots. It 
handles
     * decreasing and increasing sequence overflows or underflows cycles. 
     */
    private boolean isSequenceCacheExhausted(final long numToAllocate, final 
SequenceValue value) {
        // Perform a CheckedAdd to make sure if over/underflow 
        // We don't treat this as the cache being exhausted as the current 
value may be valid in the case
        // of no cycle, logic in increment() will take care of detecting we've 
hit the limit of the sequence
        long targetSequenceValue;
        try {
            targetSequenceValue = LongMath.checkedAdd(value.currentValue, 
numToAllocate * value.incrementBy);
        } catch (ArithmeticException e) {
            return false;
        }

        if (value.incrementBy > 0) {
            return targetSequenceValue > value.nextValue;
        } else {
            return  targetSequenceValue < value.nextValue;    
        }
        
    }
{code}

I am concerned that trying to change from the equality check for non-bulk 
allocations introduces a lot more complexity and therefore risk of regressions 
into the code. 

I think we could still fix the problem of allowing use of the cache for bulk 
allocations where there are enough values in the cache by only using the new 
logic for bulk allocations. Since we don't allow cycles this simplifies things:

{code}
    public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
        SequenceValue value = findSequenceValue(timestamp);
        if (value == null) {
            throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
        }
        
        if (SequenceUtil.isBulkAllocation(numToAllocate) && 
isSequenceCacheExhausted(numToAllocate, value)) {
            return unlessValidateSequenceThrowEmptySequenceCacheException(op, 
value);
        } else if (isSequenceCacheExhausted(numToAllocate, value)) {
            return unlessValidateSequenceThrowEmptySequenceCacheException(op, 
value);
        }
        return increment(value, op, numToAllocate);
    }

    private long unlessValidateSequenceThrowEmptySequenceCacheException(ValueOp 
op,
            SequenceValue value) throws EmptySequenceCacheException {
        if (op == ValueOp.VALIDATE_SEQUENCE) {
            return value.currentValue;
        }
        throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
    }

    /**
     * This method checks whether there are sufficient values in the 
SequenceValue
     * cached on the client to allocate the requested number of slots. It 
handles
     * decreasing and increasing sequence overflows or underflows cycles. 
     */
    private boolean isSequenceCacheExhausted(final long numToAllocate, final 
SequenceValue value) {
        long targetSequenceValue;
        try {
            targetSequenceValue = LongMath.checkedAdd(value.currentValue, 
numToAllocate * value.incrementBy);
        } catch (ArithmeticException e) {
            // Perform a CheckedAdd to make sure if over/underflow 
            // We don't treat this as the cache being exhausted as the current 
value may be valid in the case
            // of no cycle, logic in increment() will take care of detecting 
we've hit the limit of the sequence
            return false;
        }

        if (value.incrementBy > 0) {
            return targetSequenceValue > value.nextValue;
        } else {
            return  targetSequenceValue < value.nextValue;    
        }
    }
{code}

I'm still digging into the cycle and overflow issue, but as I said I am 
concerned about risk by introducing this complexity to the flow for NEXT VALUE 
FOR. Thoughts?

> Reserve chunks of numbers for a sequence
> ----------------------------------------
>
>                 Key: PHOENIX-1954
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Jan Fernando
>         Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
> PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
> PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch
>
>
> In order to be able to generate many ids in bulk (for example in map reduce 
> jobs) we need a way to generate or reserve large sets of ids. We also need to 
> mix ids reserved with incrementally generated ids from other clients. 
> For this we need to atomically increment the sequence and return the value it 
> had when the increment happened.
> If we're OK to throw the current cached set of values away we can do
> {{NEXT VALUE FOR <seq>(,<N>)}}, that needs to increment value and return the 
> value it incremented from (i.e. it has to throw the current cache away, and 
> return the next value it found at the server).
> Or we can invent a new syntax {{RESERVE VALUES FOR <seq>, <N>}} that does the 
> same, but does not invalidate the cache.
> Note that in either case we won't retrieve the reserved set of values via 
> {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
> need to guarantee is that after a call to {{RESERVE VALUES FOR <seq>, <N>}}, 
> which returns a value <M> is that the range [M, M+N) won't be used by any 
> other user of the sequence. My might need reserve 1bn ids this way ahead of a 
> map reduce run.
> Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to