GitHub user ivmaykov opened a pull request:

    https://github.com/apache/zookeeper/pull/627

     ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [part 2]

    # Overview
    These are fixes and improvements to #184 that we've coded up at Facebook 
while testing that PR internally. This is meant to be in addition to #184, and 
should be stacked on top of it (and in fact includes the commits from that pull 
request). Most notable changes:
    
    ## Fixed networking issues/bugs in UnifiedServerSocket
    - don't crash the accept() thread if the client closes the connection 
without sending any data
    - don't corrupt the connection if the client sends smaller than 5 bytes for 
the initial read
    - delay the detection of TLS vs. plaintext mode until a socket stream is 
read from or written to, for example
    - prepending 5 bytes to `PrependableSocket` and then trying to read >5 
bytes would only return the first 5 bytes, even if more bytes were available. 
This is fixed.
    
    ## Added support for PEM formatted key stores and trust stores
    - key store and trust store files can now be in PEM format as well as JKS.
    - Added config properties to tell ZK what type of trust/key store to load:
    - `zookeeper.ssl.keyStore.type` and `zookeeper.ssl.trustStore.type` for 
ClientX509Util
    - `zookeeper.ssl.quorum.keyStore.type` and 
`zookeeper.ssl.quorum.trustStore.type` for QuorumX509Util
    - store type properties could have the values "JKS", "PEM", or not set
    - leaving the type properties unset will cause auto-detection of the store 
type based on the file extension (".jks" or ".pem")
    
    ## Added support for reloading key/trust stores when the file on disk 
changes
    - new property `sslQuorumReloadCertFiles` which controls the behavior for 
reloading the key and trust store files for `QuorumX509Util`. Reloading of key 
and trust store for `ClientX509Util` is not in this PR but could be added easily
    - this allows a ZK server to keep running on a machine that uses 
short-lived certs that refresh frequently without having to restart the ZK 
process.
    
    ## Added test utilities for easily creating X509 certs and using them in 
unit tests
    - added new class `X509TestContext` and its friend, `X509TestHelpers`
    - rewrote some existing unit tests to use these classes, and added new 
tests that use them
    - some existing tests (i.e. `QuorumSSLTest`) should probably be ported to 
use this as well, haven't got around to it yet
    
    ## Added more options for ssl settings to X509Util and encapsulate them 
better
    - previously, some SSL settings came from a ZKConfig and others came from 
global `System.getProperties()`. This made it hard to isolate certain settings 
in tests.
    - now all SSL-related settings come from the ZKConfig object used to create 
the context
    - new settings added:
    - `zookeeper.ssl(.quorum).enabledProtocols` - list of enabled protocols. If 
not set, defaults to a single-entry list with the value of 
`zookeeper.ssl(.quorum).protocol`.
    - `zookeeper.ssl(.quorum).clientAuth` - can be "NONE", "WANT", or "NEED". 
This controls whether the server doesn't want / allows / requires the client to 
present an X509 certificate.
    - `zookeeper.ssl(.quorum).handshakeDetectionTimeoutMillis` - timeout for 
the first read of 5 bytes to detect the transport mode (TLS or plaintext) of a 
client connection made to a `UnifiedServerSocket`
    
    ## Fixed odd plaintext perf regression caused by new dependency on 
`org.apache.httpcomponents`
    - internal regression testing showed a >10% perf degradation in plaintext 
mode that was tracked down to the addition of the `org.apache.httpcomponents` 
dependency.
    - Removing the dependency and implementing the hostname verification by 
mostly copy-pasting the hostname verification code from `httpcomponents` fixed 
the regression.
    
    There may be some other changes that I'm forgetting, but those are the main 
ones.
    
    We believe that #184 should not be accepted without this PR coming close 
behind, mainly because of the issues with UnifiedServerSocket. We were able to 
put clusters into a bad state pretty easily simply by having a participant 
disconnect immediately after connecting to the quorum port. This crashed the 
accept thread in the Leader and prevented any other participants from 
connecting. This could happen due to network hick-ups, etc. Ideally, both PRs 
get +1s around the same time and can land with a small delay one after another.
    
    We have started testing internal builds with #184 and the code in this PR. 
Quorums with TLS disabled have shown no perf or stability regressions, so we 
believe it's safe to merge this code to master as it's purely opt-in at this 
point, gated behind config options. We've done correctness testing of quorums 
with TLS enabled, but have not done perf testing to measure the impact of 
enabling quorum TLS yet. That should happen in the near future, and I will 
update this PR with our perf test results once we have them.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivmaykov/zookeeper fb-tls

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #627
    
----
commit 88b61716d26f6b3c11325367a075bb062dc4147c
Author: Andor Molnar <andor@...>
Date:   2018-04-03T19:24:49Z

    ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol

commit c452d1b036bb3c9cd53478254129474ad4e387ea
Author: Andor Molnar <andor@...>
Date:   2018-05-18T00:09:38Z

    ZOOKEEPER-236. Fixed unit test + added some extra debug logging

commit ed10e88de70aa97f12f586d0c52b63a7d9a4cbae
Author: Andor Molnar <andor@...>
Date:   2018-05-18T16:58:40Z

    ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in 
constant value. Null checks

commit 9ab476a7a4d345a94f174bcd831a350ec92c94f9
Author: Andor Molnar <andor@...>
Date:   2018-05-20T20:09:09Z

    ZOOKEEPER-236. Trying to fix cipher suites test by changing the default 
protocol to TLSv1.2 and filter suitable cipher suites

commit d64eb26f75574f118d29cffec11d4116b469c36a
Author: Andor Molnar <andor@...>
Date:   2018-06-11T15:09:58Z

    ZOOKEEPER-236. Code review related changes:
    - server & client hostname verification can be set independently,
    - refactor defaultSSLContext to use AtomicReference,
    - some minor nitpicks

commit e8a1729794c8044f8fbde5c8f304cd1bd3648836
Author: Andor Molnar <andor@...>
Date:   2018-06-11T15:58:12Z

    ZOOKEEPER-236. Reverted to use single property for hostname verification

commit 777f31ac2dba38f2315dfdbfa47cdcb3832e8a6f
Author: Andor Molnar <andor@...>
Date:   2018-06-14T15:37:58Z

    ZOOKEEPER-236. Added Java8/Java9 default cipher suites

commit a9fa698131d36670b11973b8823efd11afa1acb5
Author: Andor Molnar <andor@...>
Date:   2018-06-15T11:57:46Z

    ZOOKEEPER-236. Code review fixes:
    
    - Added comment to clarify default enabled cipher suites,
    - Use java.specification.version to determine JDK version,
    - Use port unification only when SSL quorum is enabled,
    - Unit test fixes.

commit 1f8aab05bf45a5c6c40eb30c84c8a7f95a7fa27e
Author: Andor Molnar <andor@...>
Date:   2018-06-15T13:04:31Z

    ZOOKEEPER-236. Revert portUnification/sslQuorum logic

commit 209fbca78c083a1c06bee8d463ba77a86a3ee1ee
Author: Andor Molnar <andor@...>
Date:   2018-06-22T14:38:17Z

    ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related 
settings

commit d78e1146a8a35f6d2f9d75fd9b769ff8a423540c
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:23Z

    [rebase] [Quorum TLS] Fix bugs in PrependableSocket and UnifiedServerSocket
    
    Summary:
    Fixed some bugs in PrependableSocket and UnifiedServerSocket
    - using SequenceInputStream in PrependableSocket broke the read() behavior 
when trying to read > 5 bytes
    - changed it to use PushbackInputStream which behaves as expected
    - changed UnifiedSocketTest to read more than 5 bytes from the unified 
socket to make sure the fix works
    - call setNeedClientAuth(true) in UnifiedServerSocket.accept() when 
connection is TLS. This is to have parity with 
X509Util.configureSSLServerSocket() which sets the same behavior for strict 
server-side SSL sockets.
    - Use the Java8 API for creating a secure socket when some bytes have 
already been read. I think this allows us to avoid the overhead of 
PushbackInputStream in the SSL case.
    - added "localhost" to the connect address of client socket in 
UnifiedServerSocket test. The test didn't work for me otherwise.
    
    Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9234843
    
    Tasks: T31626216
    
    Signature: 9234843:1533776414:92d829ba49f997931345e7d90b4a2183b666f294

commit ef2db825516bf0a20d2a62a6f1c3449f543a15ca
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:24Z

    [rebase] [Quorum TLS] get SSL protocol from ZKConfig instead of System 
properties
    
    Summary: See title. This makes this property consistent with all the others 
that we already get through the ZKConfig.
    
    Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com, ssl-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9236333
    
    Tasks: T31626216
    
    Signature: 9236333:1533836898:5d1c734d96b807a5f6a39f863661669abefe11aa

commit 8682c3b48880831f70e918ddf53ea4275fa08cb2
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:25Z

    [rebase] [Quorum TLS] added X509TestContext and supporting code
    
    Summary:
    Added a few new classes that will make testing security related code much 
easier.
    - X509TestContext: easy way to manage certs and private keys in unit tests, 
to be used with SSL sockets
    - X509TestHelpers: static functions for creating certs and key pairs, and 
saving them to JKS or PEM files on disk
    - X509KeyType: enum for asymmetric key types
    
    Test Plan: Bunch of tests in future diffs will rely on it
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9174703
    
    Tasks: T31626216
    
    Signature: 9174703:1533597616:330f1ea9db736c50825d6eb084e20af91b8fd782

commit fa08a806c59f7cc6e0f46686b6b388bfe07884e9
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:26Z

    [rebase] [Quorum TLS] update X509UtilTest to use X509TestContext and 
parameterize the test
    
    Summary: See title. Now X509UtilTest has better coverage of different 
combinations of key types (RSA, EC) and password protected / unprotected key 
store. Also added some new test cases and removed redundant ones.
    
    Test Plan: ran X509UtilTest from IntelliJ
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9174711
    
    Tasks: T31626216
    
    Signature: 9174711:1533766713:efa901b625d079cf6f2fab26270faf9a39591d2e

commit ec1b1467f6934c36b3fd03b01928c1654c57b4e4
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:27Z

    [rebase] [Quorum TLS] clean up UnifiedServerSocketTest and make it use 
X509TestContext
    
    Summary: Use X509TestContext to create the cert and key files in 
UnifiedSocketTest, plus other minor cleanup.
    
    Test Plan: ran UnifiedSocketTest in IntelliJ
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9174713
    
    Tasks: T31626216
    
    Signature: 9174713:1534362831:b526048e672c9893a4046d0ff5e9df1d8cde4adc

commit d11d85cf2d3b883e5ade7029336521c77832650c
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:28Z

    [rebase] [Quorum TLS] Add support for parsing PEM encoded keys/certs
    
    Summary: See title.
    
    Test Plan: Tested it manually. Will add automated tests later.
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, bunn, dain, dongw, nedelchev, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D8906740
    
    Tasks: T31626216
    
    Signature: 8906740:1534362912:b52885e9c0f9957b0a911697e976bf153adf6197

commit 210bfe7b75429668331fc14c383f0d1c9ae151b4
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:29Z

    [rebase] [Quorum TLS] Added FileChangeWatcher class and unit test
    
    Summary: The FileChangeWatcher class provides a simple API to watch a local 
directory for file changes, backed by WatchService from the Java standard 
library. This will allow us to implement cert store / trust store reloading on 
changes in a later diff.
    
    Test Plan: added a new unit test, made sure it passes
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D8906830
    
    Tasks: T31626216
    
    Signature: 8906830:1533162316:679eceab34d30955a582ef2dde8c23822af9555d

commit b2c8593b3c9a7adca79600b1c6f9bb8946f862a0
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:30Z

    [rebase] [Quorum TLS] enable cert store / trust store reloading
    
    Summary: Enable the reloading of cert store / trust store files.
    
    Test Plan: TBD.
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D8906914
    
    Tasks: T31626216
    
    Signature: 8906914:1533600372:b254376732d94e74555160d25054a563a5378bc2

commit 9c70cc93c24d2bc7ed6af9ebd8464262aab625e3
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:31Z

    [rebase] [Quorum TLS] use UnifiedServerSocket even w/o port unification to 
allow cert reloading
    
    Summary: Added an option to UnifiedServerSocket to block insecure 
connections. This allows us to implement cert store / trust store reloading 
even when port unification is off, because the conversion to SSL socket is 
delayed until after the connection is made. This lets us use an updated SSL 
context if the file on disk changed.
    
    Test Plan: Modified UnifiedServerSocketTest
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D8906965
    
    Tasks: T31626216
    
    Signature: 8906965:1533770631:379211c3714ef8fe958012ed9584b64ecf0c4355

commit f5f54afce39bcc0f4df346bc64e4fca0916b53dd
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:33Z

    [rebase] [Quorum TLS] WIP: add denial-of-service-attack resistance to 
UnifiedServerSocket
    
    Summary:
    Added resistance to simple denial-of-service attacks to UnifiedServerSocket.
    The attack in question is simple: a client connects to the 
UnifiedServerSocket and does nothing.
    Because the old code did a blocking read() in the accept() method, this 
would hang the accept thread
    and prevent any other clients from connecting.
    
    The solution is to write a new socket type 
(`UnifiedServerSocket.UnifiedSocket`), which doesn't know
    if it's a SSLSocket or plaintext socket at the time of creation. Trying to 
read or write any data on
    the socket (by calling `getInputStream()`, `getOutputStream()`, or 
`sendUrgentData()`) will cause
    the socket to peek at the first 5 bytes of the input stream to try and 
determine if it
    should switch to TLS or stay as plaintext, using the previous logic. 
Changed `UnifiedServerSocket.accept()`
    to return the new socket type.
    
    Added new test cases to `UnifiedServerSocketTest` which check for DoS 
resistance. Had to make changes
    to the test to make `UnifiedServerThread` more closely resemble the 
behavior in Leader.java (`accept()` on one
    thread, do all reads and writes on another thread). I verified that the old 
version of the code (which blocked in
    `UnifiedServerSocket.accept()`) fails the new tests.
    
    Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9327931
    
    Tasks: T31626216
    
    Signature: 9327931:1534364537:832d8f589a70f0ceb8f4ae8faa1c3cbb4b42fd40

commit 359546a321cff748c5bd40887b45baa6ecc3e9ed
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-29T16:02:01Z

    [Quorum TLS] allow keystore/truststore password properties to be omitted if 
the password is empty
    
    Summary: Currently, the keystore password property must be specified if the 
keystore location is. This is clunky, and I bet the common case will be key 
stores not protected by a password anyway (if your key password has to be in 
plaintext in a zoo.cfg file, it's not adding any security to your key store 
file anyway). This change allows the keystore password property to be omitted 
when there is no password on the keystore.
    
    Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9374038
    
    Tasks: T31626216
    
    Signature: 9374038:1534532478:12253e0252ef69b95b2e8d7e86e96f7f988771a3

commit 33e9538f14e3264612c27e1e39b13ce748d63981
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-29T16:02:01Z

    [Quorum TLS] add config options for more TLS settings and preserve them 
alongside SSLContext in X509Util
    
    Summary:
    Added a wrapper for the SSL Context + options that have to be set on the 
socket and not on the context, and made all such options configurable via 
properties. The options are:
    - default TLS protocol to use
    - enabled protocols to potentially downgrade to if a client doesn't support 
our protocol of choice
    - enabled ciper suites
    - client auth mode (NONE / WANT / NEED)
    
    Test Plan: sandcastle
    
    Reviewers: nixon, bunn, nedelchev
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com, ssl-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9526137
    
    Tasks: T31626216
    
    Signature: 9526137:1535491578:4624f336d24c0cd737697e95b08d5a157e4abf5a

commit f2cb5c39ffdff91791e0f22680bc7bb9b889bfbd
Author: Brian Nixon <nixon@...>
Date:   2018-09-01T02:57:53Z

    remove the dependency on httpclient of org.apache.httpcomponents
    
    Summary:
    This dependency is causing a performance regression when the 
perf_regression_version_write test is run against zeus.regional.atn.10001.
    
    https://fb.facebook.com/groups/1837468356490057/permalink/2184210251815864/ 
has more context.
    
    Test Plan: Built the jar and am running perf_regression_version_write 
against it on zeus.regional.atn.10001
    
    Reviewers: ilyam, allenlyu
    
    Reviewed By: ilyam, allenlyu
    
    Subscribers: nixon, dongw, #zeus, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9622408
    
    Tasks: T31626216, T33493338
    
    Signature: 9622408:1535761773:ae82b939c8cdbf9b4ab0de3eb6e3b7a598a0a6ab

commit 956639dfceb2cb4ce67cac7b93abf668da4b89ce
Author: Ilya Maykov <ilyam@...>
Date:   2018-09-08T01:04:23Z

    [Quorum TLS] fix bugs in UnifiedServerSocket accept() flow
    
    Summary:
    There were a few issues in UnifiedServerSocket accept() flow that were 
exposed during testing:
    - if a client closed the connection without sending any data, the accept 
thread would crash because a negative return from read() was not properly 
detected
    - a client connects and doesn't send any data but keeps the connection 
open, the accept thread will not make progress. Fixed by adding a new timeout 
property for the initial read that determines the client mode, defaults to 
5000ms.
    - Exceptions in Leader.LearnerCnxAcceptor.run() could leak an open socket 
if the error happened before the socket was given to the LearnerHandler.
    
    Test Plan: compiles, passes a couple unit tests, will let sandcastle run 
the full test suite
    
    Reviewers: nixon, bunn, allenlyu
    
    Reviewed By: bunn
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9694977
    
    Tasks: T31626216
    
    Signature: 9694977:1536281695:fe2ece2be1277513de727fb3130e9d952e33f2f4

commit 0ed15965f03794c95f04d30e7d93b0a531ee3ebe
Author: Ilya Maykov <ilyam@...>
Date:   2018-09-08T01:04:24Z

    [Quorum TLS] delay unified socket mode detection to the first 
read()/write() operation
    
    Summary: Instead of reading from the underlying socket and detecting the 
mode as soon as an input/output stream is requested from a unified socket, 
delay the blocking mode detection until the first actual read()/write() on the 
requested stream. This means an accept() thread that gets the input stream but 
doesn't actually read from it and passes the stream to a worker thread will not 
get blocked.
    
    Test Plan: Modified UnifiedServerSocketTest to grab the input stream in the 
main accept() thread, make sure it passes with the changes. The modified tests 
fail as expected if the logic in UnifiedServerSocket.java is reverted.
    
    Reviewers: nixon, bunn, allenlyu
    
    Reviewed By: nixon
    
    Subscribers: nixon, dongw, zeus-di...@fb.com
    
    Differential Revision: https://phabricator.intern.facebook.com/D9696594
    
    Tasks: T31626216
    
    Signature: 9696594:1536343729:00605b4393e61f340d2d079ce00027e2e11489d0

----


---

Reply via email to