[ 
https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095771#comment-16095771
 ] 

James Taylor commented on PHOENIX-3525:
---------------------------------------

Thanks for the updated patch, [~samarthjain]. Here's some feedback:
- In MetaDataEndPointImpl, if INDEX_DISABLE_TIMESTAMP is 0, you don't want to 
set the upper bound timestamp to the current time. Perhaps you want to set it 
to 0.
{code}
+                if (disableTimeStampKVIndex >= 0) {
+                    // We only set a new value for the upper bound when the 
index disable timestamp is passed in
+                    setNewValueForUpperBound = true;
{code}

bq. For now, the delta has been configured to a default value of 1 min. We can 
change this value if we expect the clock skew between region servers to be 
larger.
It's not clock skew that necessitates this. It's the second reason you 
mentioned: because clients may not find out an index has transitioned from 
disabled to inactive and thus they may not send over the index maintainers 
which prevents the index rows from being created. It's that gap we're worried 
about. I think 1 minute is too low. Maybe 5 minutes is ok. If 
UPDATE_CACHE_FREQUENCY is set, then there's a problem too. We should ignore 
UPDATE_CACHE_FREQUENCY if a table contains an index with a non zero 
INDEX_DISABLE_TIMESTAMP. [~tdsilva] can help you make that change.

bq. I am not sure if using the index disable timestamp + 1 as the upper bound 
is the correct thing to do. The index disable timestamp is the minimum 
timestamp from all the mutations that failed. What we need is the max timestamp 
from the failed mutations to determine the the upper bound of rebuild scan.
The timestamp is the same for all mutations. The 
MetaDataEndPointImpl.updateIndexState call sees every timestamp value for the 
disable index timestamp, but it keeps the smallest one it sees. The upper bound 
timestamp would make sure it's larger than the largest one (plus one). You can 
use the current time too - it's not really different than the above logic.

bq. For batched rebuild, I am computing the scan end time using the following
We need to lookup the current value for REBUILD_INDEX_TIME_RANGE_UPPER_BOUND in 
the scan here:
{code}
                    Result r = Result.create(results);
                    byte[] disabledTimeStamp = 
r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES,
                        PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES);
{code}
For any index with a non zero INDEX_DISABLE_TIMESTAMP_BYTES, the 
REBUILD_INDEX_TIME_RANGE_UPPER_BOUND will have a value too.
Then here in this code, you'd want to only call updateIndexState if 
transitioning from DISABLE -> INACTIVE (as the current code is doing). You'd 
only use the updateIndexState return value to update the rebuildScanUpperBound 
in this case. Otherwise, you'd use the value you read.
{code}
+                    if (currentState == PIndexState.DISABLE) {
+                        // If disabled, index needs to transition to inactive 
first. This would
+                        // cause regular index maintenance to happen.
+                        newState = PIndexState.INACTIVE;
+                    } else if (indexDisableTimestamp < 0 && currentState == 
PIndexState.ACTIVE) {
+                        // If index is left active on failure, then we piggy 
back on updateIndexState
+                        // call to return us the time at which it last 
registered a failure.
+                        newState = PIndexState.ACTIVE;
+                    }
+                    Long scanUpperBound = HConstants.LATEST_TIMESTAMP;
+                    if (newState != null) {
+                        Long indexDisabletimeStampToUse =
+                                newState == PIndexState.ACTIVE ? 
indexDisableTimestamp : null;
+                        scanUpperBound =
+                                updateIndexState(conn, indexTableFullName, 
env, currentState,
+                                    newState, indexDisabletimeStampToUse);
{code}

bq. For batched rebuild, I am computing the scan end time using the following:
{code}
+                            long batchTime =
+                                    getTimestampForBatch(timeStamp,
+                                        
batchExecutedPerTableMap.get(dataPTable.getName()));
+                            long scanEndTime =
+                                    rebuildScanUpperBound != null
+                                            ? Math.min(batchTime, 
rebuildScanUpperBound)
+                                            : batchTime;
{code}
That's not quite right. It should be this:
{code}
+                            long batchTime =
+                                    getTimestampForBatch(timeStamp,
+                                        
batchExecutedPerTableMap.get(dataPTable.getName()));
+                            long scanEndTime =
+                                    batchTime == HConstants.LATEST_TIMESTAMP
+                                            ? rebuildScanUpperBound
+                                            ? Math.min(batchTime, 
rebuildScanUpperBound);
{code}

- You shouldn't be doing this:
{code}
+                            long actualScanEndTime =
+                                    System.currentTimeMillis() > 
rebuildScanUpperBoundDelta
+                                            + scanEndTime ? 
System.currentTimeMillis()
+                                                    : 
rebuildScanUpperBoundDelta + scanEndTime;
{code}
The current time is not relevant here. The 
MetaDataEndPointImpl.updateIndexState is managing the setting of the 
REBUILD_INDEX_TIME_RANGE_UPPER_BOUND. Just add the rebuildScanUpperBoundDelta 
to the rebuildScanUpperBound for the max timestamp of the scan.


> Cap automatic index rebuilding to inactive timestamp.
> -----------------------------------------------------
>
>                 Key: PHOENIX-3525
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3525
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Ankit Singhal
>         Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch
>
>
> From [[email protected]] review comment on 
> https://github.com/apache/phoenix/pull/210
> For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no 
> upper bound so we are going rebuild all the new writes written after 
> DISABLED_TIMESTAMP even though indexes updated properly. So we can introduce 
> an upper bound of time where we are going to start a rebuild thread so we can 
> limit the data to rebuild. In case If there are frequent writes then we can 
> increment the rebuild period exponentially



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to