[
https://issues.apache.org/jira/browse/HADOOP-14444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272674#comment-16272674
]
Lukas Waldmann commented on HADOOP-14444:
-----------------------------------------
Overall comment: I don't like using this patch system - it's fine for small
changes but for whole module is PITA. It's hard to track changes, make diffs
etc. Wouldn't be possible to have a branch and push things directly to it?
See my comments:
Tests
why is the FTP test skipped on Windows?
{color:#8eb021}Hmm, good question - it actually comes from original
implementation of sftp filesystem
(hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/sftp/TestSFT)
and I have no idea why windows were excluded. I will try to find windows
machine and run it there
{color}
we tend to use `setup()` `teardown()` as the @Before/@after operations in
filesystems. Having standard names makes it more consistent when
subclassing...and having >1 before/after method puts you into ambiguous
ordering. Fix: change the names, subclass as appropriate, calling the
superclass method as desired.
{color:#8eb021}will try to do - there was some problem if I remember but will
see what I can do {color}
like what you've done with the mixin to reuse all the tests, but I'd prefer a
name more unique to the FS than ContractTestBase. FTPContractTestMixin?
{color:#8eb021}will do{color}
Never thought about having `AbstractFSContract createContract()` raise an IOE.
We could add that to its signature (best in a separate JIRA)
{color:#8eb021}not sure if I understand you here{color}
You are importing the distcp tests but not using them. What's your plan there?
Get this patch in and then add that as the next iteration?
{color:#8eb021}ITestContractDistCp is using ditscp test and runs fine - did I
miss something?{color}
Docs readme should go into src/site/org/apache/hadoop/ftpextended/index.md
{color:#8eb021}will do{color}
Misc minor points
need to rename AbstractFileSystem to a class which isn't used elsewhere, e.g
AbstractFTPFileSystem
{color:#8eb021}oki{color}
use try-with-resources areound channel logic and have the implicit
channel.close() do the disconnect
{color:#8eb021}I tend to use it when possible, please let me know if you see
some specific place{color}
There's lots of opportunities to use subclasses of IOE where it is useful to
provide more meaningful failures.
{color:#8eb021}sure, not so sure if i have time to investigate and replace
them{color}
the style guidelines have conventions on import ordering we strive to maintain,
especially on new code
{color:#8eb021}do you have a link?{color}
hadoop code prefers a space after // in comments; a search & replace should fix
{color:#8eb021}will do{color}
org/apache/hadoop/fs/ftpextended/ftp/package-info.java should declare code as
@Private+Unstable. Even if the FS is public, there's no API coming from this
module, nor stability guarantees.
{color:#8eb021}will do{color}
Unless it's going to leak passwords, error messages should try and include the
filesystem URI in them. Why? helps debugging when the job is working with >1 FS
and all you have is a log to go on
{color:#8eb021}will try to find a generic way for it but it may to wait for
later{color}
When wrapping library exceptions (e.g SFTP exceptions), always include the
toString() value of the wrapped exception. It'll be the string most likely to
make it to bug reports.
{color:#8eb021}oki{color}
core-site.xml mentions s3
{color:#8eb021}Just C&P - removed
{color}
Security
I'm moving to a world where we have to provide security audits of sensitive
patches, which this is.
What's the security mechanism here?
{color:#8eb021}Well, so far none :){color}
Is Configuration.getPassword() used to get secrets through JCEKS files?
{color:#8eb021}No{color}
I see that user:password is supported. I don't like this. I guess given its
only FTP it doesn't matter that much, but on SFTP it does
And on the topic of SFTP, what to do there?
{color:#8eb021}I don't like it much either - but distributing private keys
across cluster i like even less and that you need in case of distcp or some
other MR job.
I guess it is question for the deeper discussion - i can probably use some
mechanism used in other filesystems? Do you have some ideas
?{color}
> New implementation of ftp and sftp filesystems
> ----------------------------------------------
>
> Key: HADOOP-14444
> URL: https://issues.apache.org/jira/browse/HADOOP-14444
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Affects Versions: 2.8.0
> Reporter: Lukas Waldmann
> Assignee: Lukas Waldmann
> Attachments: HADOOP-14444.10.patch, HADOOP-14444.11.patch,
> HADOOP-14444.12.patch, HADOOP-14444.2.patch, HADOOP-14444.3.patch,
> HADOOP-14444.4.patch, HADOOP-14444.5.patch, HADOOP-14444.6.patch,
> HADOOP-14444.7.patch, HADOOP-14444.8.patch, HADOOP-14444.9.patch,
> HADOOP-14444.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations
> and performance issues when dealing with high number of files. Mine patch
> solve those issues and integrate both filesystems such a way that most of the
> core functionality is common for both and therefore simplifying the
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen
> surprisingly often
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]