[
https://issues.apache.org/jira/browse/HADOOP-14444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lukas Waldmann updated HADOOP-14444:
------------------------------------
Status: Patch Available (was: In Progress)
solved:
we tend to use `setup()` `teardown()` as the @Before/@after operations in
filesystems. Having standard names makes it more consistent when
subclassing...and having >1 before/after method puts you into ambiguous
ordering. Fix: change the names, subclass as appropriate, calling the
superclass method as desired.
like what you've done with the mixin to reuse all the tests, but I'd prefer a
name more unique to the FS than ContractTestBase. FTPContractTestMixin?
Docs readme should go into src/site/org/apache/hadoop/ftpextended/index.md
need to rename AbstractFileSystem to a class which isn't used elsewhere, e.g
AbstractFTPFileSystem
hadoop code prefers a space after // in comments; a search & replace should fix
org/apache/hadoop/fs/ftpextended/ftp/package-info.java should declare code as
@Private+Unstable. Even if the FS is public, there's no API coming from this
module, nor stability guarantees.
Unless it's going to leak passwords, error messages should try and include the
filesystem URI in them. Why? helps debugging when the job is working with >1 FS
and all you have is a log to go on
When wrapping library exceptions (e.g SFTP exceptions), always include the
toString() value of the wrapped exception. It'll be the string most likely to
make it to bug reports.
core-site.xml mentions s3
> New implementation of ftp and sftp filesystems
> ----------------------------------------------
>
> Key: HADOOP-14444
> URL: https://issues.apache.org/jira/browse/HADOOP-14444
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Affects Versions: 2.8.0
> Reporter: Lukas Waldmann
> Assignee: Lukas Waldmann
> Attachments: HADOOP-14444.10.patch, HADOOP-14444.11.patch,
> HADOOP-14444.12.patch, HADOOP-14444.13.patch, HADOOP-14444.2.patch,
> HADOOP-14444.3.patch, HADOOP-14444.4.patch, HADOOP-14444.5.patch,
> HADOOP-14444.6.patch, HADOOP-14444.7.patch, HADOOP-14444.8.patch,
> HADOOP-14444.9.patch, HADOOP-14444.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations
> and performance issues when dealing with high number of files. Mine patch
> solve those issues and integrate both filesystems such a way that most of the
> core functionality is common for both and therefore simplifying the
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen
> surprisingly often
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]