[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

Pieter Reuse (JIRA) Fri, 03 Jul 2015 01:16:24 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pieter Reuse updated HADOOP-11262:
----------------------------------
    Attachment: HADOOP-11262-6.patch

Patch 6:

As requested, expanded patch 5 with tests by extending the following tests, 
overriding them with S3A specifics:
* _TestFileContext.java_
* _FileContextCreateMkdirBaseTest.java_
* _FileContextMainOperationsBaseTest.java_
* _FCStatisticsBaseTest.java_
* _FileContextURIBase.java_
* _FileContextUtilBase.java_

In doing so, fixed following bugs in _FileContextMainOperationsBaseTest.java_:
* _line 1169_: creating a symlink on an FS that doesn't support this throws an 
_UnsupportedOperationException_, not an _IOException_ (see 
_FileContext.java:1441_).
* _lines 1252 and 1313_: the contract of _read()_ is not to read the whole file 
- that's the contract of _readFully()_. For this reason tests assuming that the 
whole file has been read should use _readFully()_ instead of _read()_.

And added an enhancement for object storage systems in the same file:
* _line 1238_: an object storage system throws an _IOException_ as a file does 
not exist *before* the file is closed (nor does it have a checksum at that 
moment). This object-storage issue is resolved by changing the order of 
_fc.setVerifyChecksum(true, path)_ and _out.write(data, 0, data.length)_, while 
this does not impact the behaviour on hdfs or other file systems.

Discovered and patched the following related bugs in S3A:
* Bugfix in _S3AFileSystem.java_: ports on s3 should be ignored, which 
corresponds with a value of -1 (instead of the default 0 in FileSystem).
* Another bugfix is in _S3AFileStatus.java_: _getModificationTime()_ is 
overwritten for directories. It returns _System.currentTimeMillis()_ because an 
ObjectStore does not keep track of modification-times of directories. Because 
some parts of the Hadoop ecosystem use modification time to ignore or delete 
"old" directories (e.g. the YarnHistorySever), returning 0 for directories is 
not the best option here.

Added _TestS3AMiniYarnCluster.java_, which runs a simple _WordCount_-MapReduce 
job on a _YarnMiniCluster_ using S3A as filesystem.

> Enable YARN to use S3A 
> -----------------------
>
>                 Key: HADOOP-11262
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11262
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Thomas Demoor
>            Assignee: Pieter Reuse
>              Labels: amazon, s3
>         Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

Reply via email to