[ 
https://issues.apache.org/jira/browse/HADOOP-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639773#comment-16639773
 ] 

Steve Loughran commented on HADOOP-14556:
-----------------------------------------

Patch 010. Contains a failed attempt to do a full e2e MR job with secure tokens.

I made the mistake of trying to get a clone of the (Working) 
{{ITestS3AMiniYarnCluster}} test to now fetch delegation tokens, where the 
following sequence of problems were encountered, in roughly this order. 

# YARN doesn't fetch DTs unless security is enabled. Action: enable security.
# MR reduce doesn't work with the local FS as the cluster FS when security is 
enabled, because LocalFetcher demands the native libs at this point for secure 
Posix operations, and test suites can't depend on that. Action: switch to HDFS
# HDFS is a PITA to bring up securely, Action, create new class 
{{MiniKerberizeHadoopCluster}} tries to encapsulate the actions and config 
changes from some HDFS tests to bring up a secure HDFS cluster.
# YARN is a PITA to bring up securely, and none of the YARN tests seem to do a 
full secure MiniKDC + HDFS + YARN. Action. Try to get 
{{MiniKerberizeHadoopCluster}} to do this, too, with a bit more guesswork on 
setttings
# Job submit fails as the NM can't auth with the DN, even though the junit 
thread can. DN keeps breaking connections. Possibly some localhost/external IP 
address problem, which I can't debug because I can't turn the firewall off. 
Action: give up on secure HDFS
# Switch to trying to use S3A itself as the cluster FS; lifting bits of code 
from (insecure) MR atop HDFS test cases to get the relevant MR JAR up there.
# Test case fails {{testWordCount}} fails to submit job (server-side SASL 
problems where principal doesn't have a full user/name@realm structure ("Empty 
nameString not allowed").
# That problem goes away if you run all three tests in the test suite (word 
count, FS access and probe for the RM port being open). Action: run all three 
tests from the IDE. (note, they set up individual YARN clusters each)
# Then you can submit work. But it doesn't run. And there's no logs other than 
this:

{code}
[2018-10-05 13:34:13.217]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver 
as a provider class
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
provider class
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a 
root resource class
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to 
GuiceManagedComponentProvider with the scope "Singleton"
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
GuiceManagedComponentProvider with the scope "Singleton"
Oct 05, 2018 1:34:00 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to 
GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
{code}

I've given up on the test at this point. I'm submitting the patch, and I'm 
going to keep it but set to skip, "broken", always for now. OR just tag & 
delete in my git repo. 

* I now understand why there aren't any end-to-end tests of MR across a secure 
mini-(kdc, hdfs, yarn) cluster.
* the only way I can see this working is to not attempt wordcount but instead 
some trivial AM which just checks for FS access and then exits. That way, all 
the MR pain is gone. Still doesn't deal with secure yarn launch though.

Tested: s3 ireland. All tests working except

{code}
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.206 
s <<< FAILURE! - in org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob
[ERROR] 
testWordCount(org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob)  
Time elapsed: 99.301 s  <<< FAILURE!
java.lang.AssertionError: Job did not complete
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob.testWordCount(ITestDelegatedMRJob.java:334)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

Also saw a transient failure of {{ITestS3GuardToolDynamoDB}}, which I couldn't 
replicate standalone. No obvious cause.

{code}
[ERROR] 
testPruneCommandCLI(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  
Time elapsed: 615.35 s  <<< ERROR!
java.lang.Exception: test timed out after 600000 milliseconds
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoffOnBatchWrite(DynamoDBMetadataStore.java:813)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:765)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:851)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1080)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1033)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:993)
        at 
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:271)
        at 
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandCLI(AbstractS3GuardToolTestBase.java:286)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

[ERROR] 
testSetCapacityFailFastIfNotGuarded(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
  Time elapsed: 1.854 s  <<< ERROR!
java.io.FileNotFoundException: DynamoDB table 
'd5ade866-32f4-42c7-b141-c8995c096a26' does not exist in region eu-west-1; 
auto-creation is turned off
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initTable(DynamoDBMetadataStore.java:1213)
        at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:356)
        at 
org.apache.hadoop.fs.s3a.s3guard.S3Guard.getMetadataStore(S3Guard.java:99)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:378)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
        at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3377)
        at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:530)
        at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$SetCapacity.run(S3GuardTool.java:513)
        at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:353)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1552)
        at 
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.run(AbstractS3GuardToolTestBase.java:115)
        at 
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.lambda$testSetCapacityFailFastIfNotGuarded$2(AbstractS3GuardToolTestBase.java:331)
        at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:494)
        at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:380)
        at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:449)
        at 
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testSetCapacityFailFastIfNotGuarded(AbstractS3GuardToolTestBase.java:330)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
{code}





> S3A to support Delegation Tokens
> --------------------------------
>
>                 Key: HADOOP-14556
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14556
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-14556-001.patch, HADOOP-14556-002.patch, 
> HADOOP-14556-003.patch, HADOOP-14556-004.patch, HADOOP-14556-005.patch, 
> HADOOP-14556-007.patch, HADOOP-14556-008.patch, HADOOP-14556-009.patch, 
> HADOOP-14556-010.patch, HADOOP-14556.oath-002.patch, HADOOP-14556.oath.patch
>
>
> S3A to support delegation tokens where
> * an authenticated client can request a token via 
> {{FileSystem.getDelegationToken()}}
> * Amazon's token service is used to request short-lived session secret & id; 
> these will be saved in the token and  marshalled with jobs
> * A new authentication provider will look for a token for the current user 
> and authenticate the user if found
> This will not support renewals; the lifespan of a token will be limited to 
> the initial duration. Also, as you can't request an STS token from a 
> temporary session, IAM instances won't be able to issue tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to