Please note that Hadoop is essentially an on-prem data infrastructure these days. To fully leverage Hadoop you typically run other compute engines such as Hive, Spark, HBase and so on. So YMMV.
Folks on the mailing list, please feel free to add/correct my comments. • What type of secure connection will there be between Apache Hadoop and VA systems in terms of secure protocols implemented? Hadoop implements SSL/TLS up to TLS 1.2 as of Hadoop 3.2 for HTTP-based connections. SPNEGO is supported too. Separately, Hadoop's RPC protocol leverages Java SASL APIs to authenticate/encrypt. Kerberos is supported via Java SASL. Hadoop's data transfer leverages OpenSSL for data encryption. There are additional custom protocols (Hadoop Delegation Token) that authenticates users in a cluster. • To what extent does Apache Hadoop use a FIPS 140-2 validated cryptographic module, and what is the certification number? Hadoop KMS is the cryptographic module in Hadoop. As far as I know it is not FIPS 140-2 certified. You may need a commercial vendor for that purpose. • What is the most recent version of Apache Hadoop and its release date? We maintain several branches. https://hadoop.apache.org/releases.html • Is there a Voluntary Product Accessibility Template (VPAT) program in place to assess Section 508 compliance? I looked it up at Wikipedia and I still don't know what this is. You may need to hire a consultant to help. I suspect we don't, otherwise we wouldn't have used light green as the header of HDFS NameNode web UI. • What are the main features of Apache Hadoop? https://hadoop.apache.org/ The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. • What Cloud Service Provider (CSP) agreements have been set for Apache Hadoop to be used securely through the cloud? Apache Hadoop is an open source project. You need a commercial vendor to answer this question for you. • Does Apache Hadoop offer an Application Program Interface (API)? Yes -- we offer Java API, C API and RESTFUL API. • What other apps does Apache Hadoop integrate with? Many -- HBase, Hive, Spark, Presto and many others. Search for "Hadoop ecosystem" • What level of support does Apache Hadoop offer? The project itself provides community support. Commercial support is available via commercial vendors. Cloudera, for example, offers several levels of technical support and services. • Does Apache Hadoop leverage other database products? No. However, if you use Hive on top of Hadoop, Hive requires a metastore that runs on a database. • Is Apache Hadoop available for on-premise deployment? Yes. • Does Apache Hadoop reside on user network? Not sure what this means. Thank you for your willingness to help by answering these questions. Please note that I am working on a tight deadline, and must have my research completed within three business days. If you would please acknowledge your initial receipt of my email, it would be greatly appreciated. Please contact me if you have any questions or concerns. Best Regards, Foday B. Fofanah (Contractor) Senior Security Analyst (Prosphere) Solution Delivery (Station 116) (005OPB14) Office of Information and Technology, IT Operations and Services (ITOPS) Office: (202) 461-4424 On Wed, Nov 13, 2019 at 2:07 PM Fofanah, Foday B. (Prosphere) <foday.fofa...@va.gov.invalid> wrote: > Hello, > > I am reaching out to you from the Department of Veterans Affairs (VA) > where I am part of the team that reviews various information-based products > from an information security perspective for use within VA. I am reviewing > information regarding Apache Hadoop and have several questions listed > below; please respond to the best of your ability so that I may use your > answers to reach a final determination. > > • What type of secure connection will there be between Apache > Hadoop and VA systems in terms of secure protocols implemented? > > > > • To what extent does Apache Hadoop use a FIPS 140-2 validated > cryptographic module, and what is the certification number? > > > > • What is the most recent version of Apache Hadoop and its > release date? > > > > • Is there a Voluntary Product Accessibility Template (VPAT) > program in place to assess Section 508 compliance? > > > > • What are the main features of Apache Hadoop? > > > > • What Cloud Service Provider (CSP) agreements have been set > for Apache Hadoop to be used securely through the cloud? > > > > • Does Apache Hadoop offer an Application Program Interface > (API)? > > > > • What other apps does Apache Hadoop integrate with? > > > > • What level of support does Apache Hadoop offer? > > > > • Does Apache Hadoop leverage other database products? > > > > • Is Apache Hadoop available for on-premise deployment? > > > > • Does Apache Hadoop reside on user network? > > Thank you for your willingness to help by answering these questions. > Please note that I am working on a tight deadline, and must have my > research completed within three business days. If you would please > acknowledge your initial receipt of my email, it would be greatly > appreciated. > > Please contact me if you have any questions or concerns. > > > > Best Regards, > > > > Foday B. Fofanah (Contractor) > > Senior Security Analyst (Prosphere) > > Solution Delivery (Station 116) (005OPB14) > > Office of Information and Technology, IT Operations and Services (ITOPS) > > Office: (202) 461-4424 > > > > > > >