[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764594#comment-13764594
 ] 

Jason Lowe commented on MAPREDUCE-5332:
---------------------------------------

Thanks for the review, Daryn!

bq. Suggest: It may be clearer to rename the TOKEN_FOO_PREFIX constants to be 
TOKEN_FOO_DIR_PREFIX or TOKEN_*_FILE_PREFIX.

Will do.

bq. Suggest: I'd consider not having the hardcoded ROOT_STATE_DIR_NAME added to 
the user's path configured by MR_HS_FS_STATE_STORE_URI.

I followed the precedent set by FileSystemRMStateStore.  It seems safer to add 
a possibly redundant directory level in case the user configured the URI to a 
public directory intended for other users (e.g.: something like /tmp).

bq. Question: In startStateStorage, why 2 mkdirs instead of 1?

This is primarly because mkdirs isn't guaranteed to set the permissions of any 
created parent directories properly.  I believe HDFS does this, but 
RawLocalFileSystem does not.

bq. Bug: Unlike HistoryServerMemStateStore, there appear to be no checks for 
things being added twice - although arguably those checks all belong in ADTSM. 
Token check is there, but not a secret check. I think the state stores should 
behave consistently.

createFile() should fail if the file is already present, so the file system 
store should correctly fail if a token or key is added redundantly.

bq. Bug: In getBucketPath, I think you want to mod (%) the seq number instead 
of dividing? Otherwise it creates a janitorial job for someone to clean up 
empty directories.

Good catch!

bq. Suggest: For clarify, perhaps rename to HistoryServerStateStore*Service*. I 
kept getting it confused with HistoryServerState.

Will do.

bq. Suggest: I'd consider removing the dtsm's recover method. Perhaps loadState 
can take the dtsm as an argument and directly populate it instead of populating 
an intermediary HistoryServerState object before populating the dtsm.

I'm following RMStateStore precedent here as well.  I believe the intent is for 
the state object to shield the state stores from knowledge of the secret 
manager internals and vice-versa.

{quote}
Bug: Is the stateStore going to be started twice? Once in startService if 
recoveryEnabled, again by super#startService when it iterates the composite 
services?
Bug: Should the state store service be started after being recovered? Not 
before?
{quote}

The stateStore needs to be started before recovery can occur, and starting a 
service twice is a no-op.  So it should work OK as written.  However I'll clean 
up the out-of-band state store start with a simple recovery service that is 
added to the composite service right after the state store.  The recovery 
service's start method can check if recovery is enabled and perform the 
recovery on the (now started) state store before the other services are started.

bq. Bug? Seems a bit odd if recovery is enabled but there's no class defined, a 
HistoryServerNullStateStore is created. It appears JHS#serviceStart will fail 
when it calls loadState and an UnsupportedOperationException is thrown. The 
null store seems to have no real value other than deferring an error from 
JHS#serviceInit to JHS#serviceStart?

The null store is necessary to avoid is-recovery-enabled checks throughout the 
secret manager as it updates state.  It fails on recovery to catch the scenario 
where the user enabled recovery but forgot to configure a state store.
                
> Support token-preserving restart of history server
> --------------------------------------------------
>
>                 Key: MAPREDUCE-5332
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: jobhistoryserver
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, 
> MAPREDUCE-5332.patch
>
>
> To better support rolling upgrades through a cluster, the history server 
> needs the ability to restart without losing track of delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to