[
https://issues.apache.org/jira/browse/HADOOP-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639773#comment-16639773
]
Steve Loughran commented on HADOOP-14556:
-----------------------------------------
Patch 010. Contains a failed attempt to do a full e2e MR job with secure tokens.
I made the mistake of trying to get a clone of the (Working)
{{ITestS3AMiniYarnCluster}} test to now fetch delegation tokens, where the
following sequence of problems were encountered, in roughly this order.
# YARN doesn't fetch DTs unless security is enabled. Action: enable security.
# MR reduce doesn't work with the local FS as the cluster FS when security is
enabled, because LocalFetcher demands the native libs at this point for secure
Posix operations, and test suites can't depend on that. Action: switch to HDFS
# HDFS is a PITA to bring up securely, Action, create new class
{{MiniKerberizeHadoopCluster}} tries to encapsulate the actions and config
changes from some HDFS tests to bring up a secure HDFS cluster.
# YARN is a PITA to bring up securely, and none of the YARN tests seem to do a
full secure MiniKDC + HDFS + YARN. Action. Try to get
{{MiniKerberizeHadoopCluster}} to do this, too, with a bit more guesswork on
setttings
# Job submit fails as the NM can't auth with the DN, even though the junit
thread can. DN keeps breaking connections. Possibly some localhost/external IP
address problem, which I can't debug because I can't turn the firewall off.
Action: give up on secure HDFS
# Switch to trying to use S3A itself as the cluster FS; lifting bits of code
from (insecure) MR atop HDFS test cases to get the relevant MR JAR up there.
# Test case fails {{testWordCount}} fails to submit job (server-side SASL
problems where principal doesn't have a full user/name@realm structure ("Empty
nameString not allowed").
# That problem goes away if you run all three tests in the test suite (word
count, FS access and probe for the RM port being open). Action: run all three
tests from the IDE. (note, they set up individual YARN clusters each)
# Then you can submit work. But it doesn't run. And there's no logs other than
this:
{code}
[2018-10-05 13:34:13.217]Container exited with a non-zero exit code 1. Error
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver
as a provider class
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a
provider class
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a
root resource class
Oct 05, 2018 1:34:00 PM
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to
GuiceManagedComponentProvider with the scope "Singleton"
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to
GuiceManagedComponentProvider with the scope "Singleton"
Oct 05, 2018 1:34:00 PM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to
GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
{code}
I've given up on the test at this point. I'm submitting the patch, and I'm
going to keep it but set to skip, "broken", always for now. OR just tag &
delete in my git repo.
* I now understand why there aren't any end-to-end tests of MR across a secure
mini-(kdc, hdfs, yarn) cluster.
* the only way I can see this working is to not attempt wordcount but instead
some trivial AM which just checks for FS access and then exits. That way, all
the MR pain is gone. Still doesn't deal with secure yarn launch though.
Tested: s3 ireland. All tests working except
{code}
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.206
s <<< FAILURE! - in org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob
[ERROR]
testWordCount(org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob)
Time elapsed: 99.301 s <<< FAILURE!
java.lang.AssertionError: Job did not complete
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at
org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob.testWordCount(ITestDelegatedMRJob.java:334)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}
Also saw a transient failure of {{ITestS3GuardToolDynamoDB}}, which I couldn't
replicate standalone. No obvious cause.
{code}
[ERROR]
testPruneCommandCLI(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
Time elapsed: 615.35 s <<< ERROR!
java.lang.Exception: test timed out after 600000 milliseconds
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoffOnBatchWrite(DynamoDBMetadataStore.java:813)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:765)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:851)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1080)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1033)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:993)
at
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:271)
at
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandCLI(AbstractS3GuardToolTestBase.java:286)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
[ERROR]
testSetCapacityFailFastIfNotGuarded(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
Time elapsed: 1.854 s <<< ERROR!
java.io.FileNotFoundException: DynamoDB table
'd5ade866-32f4-42c7-b141-c8995c096a26' does not exist in region eu-west-1;
auto-creation is turned off
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initTable(DynamoDBMetadataStore.java:1213)
at
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:356)
at
org.apache.hadoop.fs.s3a.s3guard.S3Guard.getMetadataStore(S3Guard.java:99)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:378)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3377)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:530)
at
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$SetCapacity.run(S3GuardTool.java:513)
at
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:353)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1552)
at
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.run(AbstractS3GuardToolTestBase.java:115)
at
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.lambda$testSetCapacityFailFastIfNotGuarded$2(AbstractS3GuardToolTestBase.java:331)
at
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:494)
at
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:380)
at
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:449)
at
org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testSetCapacityFailFastIfNotGuarded(AbstractS3GuardToolTestBase.java:330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
{code}
> S3A to support Delegation Tokens
> --------------------------------
>
> Key: HADOOP-14556
> URL: https://issues.apache.org/jira/browse/HADOOP-14556
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-14556-001.patch, HADOOP-14556-002.patch,
> HADOOP-14556-003.patch, HADOOP-14556-004.patch, HADOOP-14556-005.patch,
> HADOOP-14556-007.patch, HADOOP-14556-008.patch, HADOOP-14556-009.patch,
> HADOOP-14556-010.patch, HADOOP-14556.oath-002.patch, HADOOP-14556.oath.patch
>
>
> S3A to support delegation tokens where
> * an authenticated client can request a token via
> {{FileSystem.getDelegationToken()}}
> * Amazon's token service is used to request short-lived session secret & id;
> these will be saved in the token and marshalled with jobs
> * A new authentication provider will look for a token for the current user
> and authenticate the user if found
> This will not support renewals; the lifespan of a token will be limited to
> the initial duration. Also, as you can't request an STS token from a
> temporary session, IAM instances won't be able to issue tokens.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]