[ 
https://issues.apache.org/jira/browse/HADOOP-14444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272674#comment-16272674
 ] 

Lukas Waldmann commented on HADOOP-14444:
-----------------------------------------

Overall comment: I don't like using this patch system - it's fine for small 
changes  but for whole module is PITA. It's hard to track changes, make diffs 
etc. Wouldn't be possible to have a branch and push things directly to it?

See my comments:
Tests
why is the FTP test skipped on Windows?
{color:#8eb021}Hmm, good question - it actually comes from original 
implementation of sftp filesystem 
(hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/sftp/TestSFT)
 and I have no idea why windows were excluded. I will try to find windows 
machine and run it there 
{color}

we tend to use `setup()` `teardown()` as the @Before/@after operations in 
filesystems. Having standard names makes it more consistent when 
subclassing...and having >1 before/after method puts you into ambiguous 
ordering. Fix: change the names, subclass as appropriate, calling the 
superclass method as desired.
{color:#8eb021}will try to do - there was some problem if I remember but will 
see what I can do {color}

like what you've done with the mixin to reuse all the tests, but I'd prefer a 
name more unique to the FS than ContractTestBase. FTPContractTestMixin?
{color:#8eb021}will do{color}

Never thought about having `AbstractFSContract createContract()` raise an IOE. 
We could add that to its signature (best in a separate JIRA)
{color:#8eb021}not sure if I understand you here{color}

You are importing the distcp tests but not using them. What's your plan there? 
Get this patch in and then add that as the next iteration?
{color:#8eb021}ITestContractDistCp is using ditscp test and runs fine - did I 
miss something?{color}

Docs readme should go into src/site/org/apache/hadoop/ftpextended/index.md
{color:#8eb021}will do{color}

Misc minor points
need to rename AbstractFileSystem to a class which isn't used elsewhere, e.g 
AbstractFTPFileSystem
{color:#8eb021}oki{color}

use try-with-resources areound channel logic and have the implicit 
channel.close() do the disconnect
{color:#8eb021}I tend to use it when possible, please let me know if you see 
some specific place{color}
 
There's lots of opportunities to use subclasses of IOE where it is useful to 
provide more meaningful failures.
{color:#8eb021}sure, not so sure if i have time to investigate and replace 
them{color}

the style guidelines have conventions on import ordering we strive to maintain, 
especially on new code
{color:#8eb021}do you have a link?{color}

hadoop code prefers a space after // in comments; a search & replace should fix
{color:#8eb021}will do{color}

org/apache/hadoop/fs/ftpextended/ftp/package-info.java should declare code as 
@Private+Unstable. Even if the FS is public, there's no API coming from this 
module, nor stability guarantees.
{color:#8eb021}will do{color}

Unless it's going to leak passwords, error messages should try and include the 
filesystem URI in them. Why? helps debugging when the job is working with >1 FS 
and all you have is a log to go on
{color:#8eb021}will try to find a generic way for it but it may to wait for 
later{color}
 
When wrapping library exceptions (e.g SFTP exceptions), always include the 
toString() value of the wrapped exception. It'll be the string most likely to 
make it to bug reports.
{color:#8eb021}oki{color}

core-site.xml mentions s3
{color:#8eb021}Just C&P - removed
{color}

Security
I'm moving to a world where we have to provide security audits of sensitive 
patches, which this is.
What's the security mechanism here?
{color:#8eb021}Well, so far none :){color}

Is Configuration.getPassword() used to get secrets through JCEKS files?
{color:#8eb021}No{color}

I see that user:password is supported. I don't like this. I guess given its 
only FTP it doesn't matter that much, but on SFTP it does
And on the topic of SFTP, what to do there?
{color:#8eb021}I don't like it much either - but distributing private keys 
across cluster i like even less and that you need in case of distcp or some 
other MR job.
I guess it is question for the deeper discussion - i can probably use some 
mechanism used in other filesystems? Do you have some ideas
?{color}

> New implementation of ftp and sftp filesystems
> ----------------------------------------------
>
>                 Key: HADOOP-14444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14444
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Lukas Waldmann
>            Assignee: Lukas Waldmann
>         Attachments: HADOOP-14444.10.patch, HADOOP-14444.11.patch, 
> HADOOP-14444.12.patch, HADOOP-14444.2.patch, HADOOP-14444.3.patch, 
> HADOOP-14444.4.patch, HADOOP-14444.5.patch, HADOOP-14444.6.patch, 
> HADOOP-14444.7.patch, HADOOP-14444.8.patch, HADOOP-14444.9.patch, 
> HADOOP-14444.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to