[ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305306#comment-15305306
 ] 

Steve Loughran commented on HADOOP-12756:
-----------------------------------------

I concur with [~aw]: if a jenkins server is running submitted code, then it is 
precisely one patch submission away from having credentials leaked.

There's a set of problems that need to be addressed when working with object 
stores
# Development: does your own code work?
# Patch review: does a newly submitted patch work?
# Regression testing: does the branch/trunk work?

**Development** Development in a module for a specific infra: aws, openstack, 
azure must, obviously, require the credentials to test there. More subtly, 
changes to the filesystem APIs *and tests* need testing too. In HADOOP-13207, 
for example, I have to test all implementations of an abstract contract test: 
local, rawlocal, HDFS, s3a, azure.

*Regression Testing* This is what makes reviewing object store patches hard. 
The reviewer needs to have the credentials, first prescan the review to make 
sure it doesn't leak information (that's both malicious attacks and simply 
over-zealous logging). Then they need to do a test run, which is about 30-60 
minutes —which is why it's pretty frustrating if the patch fails. Hence the 
policy of: nobody will look at your patch until you declare which infra your 
tests successfully completed against. It forces the developers to apply due 
diligence.

Maybe, just maybe, this could be partially automated. As an example in spark 
PRs, a set of committers can add a comment, 'Jenkins, test this" and the UCB 
Jenkins engine will run a test. If we could something like that, with a patch 
test only kicking off after human intervention, we could improve patch review.

*Regression Testing*

This is an area where a private jenkins instance with the credentials can 
contribute: nightly test runs of the object store module(s) —and a process for 
reacting to failures. We do this internally a lot, where the escalation process 
is: someone gets to fix the failure. It's that escalation process which needs 
to be set up —its not enough for a private Jenkins machine/VM to send emails 
saying a test run failed, it needs having people on the developer lists who 
care and can react. That means, you get to stay on the dev lists —welcome!

Note that in SPARK-7481 I'm adding end-to-end testing through spark; you can 
see it at work by comparing an s3a test run with the hadoop 2.6 profile vs 
hadoop-2.7. the 2.6 one is clearly broken —if we'd had those tests up earlier, 
that'd have been clear at the time. I'm designing that module to be extensible, 
once it's in, adding dependencies and tests for a new FS should be 
straightforward


> Incorporate Aliyun OSS file system implementation
> -------------------------------------------------
>
>                 Key: HADOOP-12756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12756
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: shimingfei
>            Assignee: shimingfei
>         Attachments: HADOOP-12756-v02.patch, HCFS User manual.md, OSS 
> integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to