[ 
https://issues.apache.org/jira/browse/ACCUMULO-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480358#comment-13480358
 ] 

Christopher Tubbs commented on ACCUMULO-826:
--------------------------------------------

I think a more important question is: why does the user's password get stored 
in a file without their knowledge, as a side-effect of the implementation? I 
think this fundamentally goes against the point of having static method to 
manipulate the job configuration (the point being to only hold job state in the 
configuration, and not elsewhere with side-effects).

I do understand the benefit of not showing the user's password in the job 
configuration, which is viewable from the JobTracker page, whereas the contents 
of this file wouldn't be. Perhaps that was the reasoning for storing the 
password in a file in the first place. However, I think it should be up to the 
user to manage this file's persistence, so we don't do unpredictable/unexpected 
things, like create a file with their password in it without their knowledge or 
expectation, or automatically delete the file when the client dies after the 
MapReduce job has been submitted, therefore killing the MapReduce job.

I propose we modify the method signature to look like:
{code:java}
public static void setInputInfo(Configuration conf, String user, Path 
fileWithPasswordInHDFS, String table, Authorizations auths);
{code}

Users could then re-use this file for multiple jobs, and they can control 
read/write access to it.

This change may need to go through a deprecation path, and we may not want to 
do this until 1.5.0.
                
> MapReduce over accumlo fails if process that started job is killed
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-826
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-826
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: Keith Turner
>            Priority: Critical
>
> While testing the 1.4.2rc2 I started a continuous verify and killed the 
> process that started the job.  Normally you would expect the job to keep 
> running when you do this.  Howerver task started to fail.  I was seeing 
> errors like the following.
> {noformat}
> java.io.FileNotFoundException: File does not exist: 
> /user/hadoop/ContinuousVerify_13506740685261350674068686.pw
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1685)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1676)
>       at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:479)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:418)
>       at 
> org.apache.accumulo.core.client.mapreduce.InputFormatBase.getPassword(InputFormatBase.java:681)
>       at 
> org.apache.accumulo.core.client.mapreduce.InputFormatBase$RecordReaderBase.initialize(InputFormatBase.java:1155)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {noformat}
> I think this is caused by the following code in InputFormatBase
> {code:java}
>   public static void setInputInfo(Configuration conf, String user, byte[] 
> passwd, String table, Authorizations auths) {
>     if (conf.getBoolean(INPUT_INFO_HAS_BEEN_SET, false))
>       throw new IllegalStateException("Input info can only be set once per 
> job");
>     conf.setBoolean(INPUT_INFO_HAS_BEEN_SET, true);
>     
>     ArgumentChecker.notNull(user, passwd, table);
>     conf.set(USERNAME, user);
>     conf.set(TABLE_NAME, table);
>     if (auths != null && !auths.isEmpty())
>       conf.set(AUTHORIZATIONS, auths.serialize());
>     
>     try {
>       FileSystem fs = FileSystem.get(conf);
>       Path file = new Path(fs.getWorkingDirectory(), 
> conf.get("mapred.job.name") + System.currentTimeMillis() + ".pw");
>       conf.set(PASSWORD_PATH, file.toString());
>       FSDataOutputStream fos = fs.create(file, false);
>       fs.setPermission(file, new FsPermission(FsAction.ALL, FsAction.NONE, 
> FsAction.NONE));
>       fs.deleteOnExit(file);  // <--- NOT 100% sure, but I think this is the 
> culprit
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to