I would like to propose DLab as an Apache Incubator project.

The text of the proposal can be found below as well as on the Incubator wiki:

https://wiki.apache.org/incubator/DLabProposal

We are seeking additional mentors and would welcome anyone who would like to 
volunteer.

-Taylor


= DLab Proposal =

== Abstract ==
DLab is a platform for creating self-service, exploratory data science 
environments in the cloud using best-of-breed data science tools.

DLab includes a self-service web console, used to create and manage exploratory 
environments. It allows teams to spin up analytical environments with just a 
single click of a mouse. Once established, the environment can be managed by an 
analytical team itself, leveraging simple and easy-to-use web-based interface.

== Proposal ==
In order to work effectively, data scientists rely on a varying suite of 
analytics tools that are readily available. However, many of those tools are 
non-trivial to set up in terms of hardware provisioning, software installation, 
configuration, and deployment. Setting up a collaborative, multi-tenant 
development environment for data scientists consumes substantial IT and DevOps 
resources, as well as time. These factors often combine to hinder the agility 
and effectiveness of data science teams within an organization. Current 
solutions are largely closed source and/or proprietary, and committing to a 
given solution introduces the potential for vendor lock-in.

EPAM Systems developed DLab in response to the lack of open source, permissibly 
licensed solutions to better enable data science workflows. The ALv2 was 
selected to encourage open development and user adoption. DLab was open sourced 
on Dec 29, 2016 and is under active development with support from EPAM Systems.

We believe DLab is a unique solution with no current open source equivalent. 
Our primary goals of incubation are to grow and diversify the DLab community to 
ensure its long-term sustainability.

== Rationale ==
DLab is a platform that provides data scientists with the ability to 
self-provision, without IT support, exploratory and production environments 
with their preferred set of tools installed and pre-configured. Tool options 
include, but are not limited to:

 * Apache Spark
 * Apache Flink (planned)
 * Apache Zeppelin
 * Jupyter
 * TensorFlow + Jupyter
 * Deep Learning + Jupyter

DLab leverages cloud computing providers for virtual hardware provisioning and 
currently supports the following:

 * Amazon Web Services (AWS)
 * Microsoft Azure
 * Google Compute Platform (GCP) (under development)

DLab offers git-based collaboration tools for data scientists and developers 
and integrates with the following git service providers:

 * GItHub
 * GitLab
 * BitBucket

Additionally, DLab includes the option to configure the UnGit tool in an 
environment to facilitate collaboration.
Finally, DLab integrates closely with many security and SSO offerings, 
including:

 * LDAP
 * Microsoft Active Directory
 * AWS Identity Access Management service

DLab was designed from the ground up to be highly configurable, flexible, and 
extensible platform. We believe these qualities will encourage community growth 
by enabling contributors to easily add new integrations and extensions.

== Initial Goals ==
The initial goal will be to move the existing codebase to Apache and integrate 
with the Apache development process and infrastructure. A primary goal of 
incubation will be to grow and diversify the DLab PPMC. We are well aware that 
the project community is comprised of individuals from a single company. We aim 
to change that during incubation.

== Current Status ==
As previously mentioned, DLab is under active development at EPAM Systems, and 
is being used in a number of production deployments:

 * [An investment company] is using DLab as an AWS-based analytics platform for 
their data scientists to provide a convenient way to perform multi-tenant data 
analytics. This enables data scientists to easily provision work environments 
with integrated data sources based on Elasticsearch, Apache HBase, and Neo4j, 
and utilizing Apache Spark. This enabled a “one click”, self service option for 
users to provision an environment with the necessary tools and data.

 * [An electronics manufacturing company] leverages DLab for data quality, data 
exploration, and analytics. The company’s data scientists leverage DLab to work 
with data sources that have been transferred to the cloud in order to find new 
insights on the data, and help the implementation team define requirements for 
data engineering. The main goal is to increase the utilization of various tools 
by decreasing time to deployment.

 * [A retail company] is using DLab as an image recognition framework, to 
enable automated restocking of inventory.

 * [A travel company] is using DLab to create recommendation engine that will 
allow end users to find more relevant accommodations faster and at a lower cost.

=== Meritocracy ===
We value meritocracy and we understand that it is the basis for an open 
community that encourages multiple companies and individuals to contribute and 
be invested in the project’s future. We will encourage and monitor 
participation and make sure to extend privileges and responsibilities to all 
contributors.

=== Community ===
DLab is currently being used by developers at EPAM and a gowing number of 
customers are actively using it in production environments. By bringing DLab to 
Apache we hope to broaden and diversity the user and developer community 
through open collaboration.

=== Core Developers ===
DLab was initially developed at EPAM Systems and is under active development. 
We believe DLab will be of interest to a broad range of users and devlopers and 
that incubating the project at the ASF will help us build a diverse, 
sustainable community.

=== Alignment ===
DLab utilizes other Apache projects such as Apache Spark, Apache Toree 
(incubating), and Apache Zeppelin, along with a number of other Apache 
libraries. We anticipate integration with additional Apache projects as the 
DLab community and interest in the project grows.

== Known Risks ==

=== Orphaned products ===
EPAM Systems is committed to the future development of DLab and understands 
that graduation to a TLP, while preferable, is not the only positive outcome of 
incubation.

Should the DLab project be accepted by the Incubator, the prospective PPMC 
would be willing to agree to a target incubation period of 2 years or less, 
knowing that every Incubator project incurs a certain cost in terms of ASF 
infrastructure and volunteer time.

=== Inexperience with Open Source ===
Many DLab contributors are already familiar with open source processes and 
several of them are committers on other Apache projects. We will be actively 
working with experienced Apache community members to improve our project.

=== Homogenous Developers ===
The initial committers of DLab all come from EPAM Systems,  though we are 
committed to recruiting and developing additional committers from a wide 
spectrum of industries and backgrounds.

=== Reliance on Salaried Developers ===
It is expected that DLab development will occur on both salaried time and on 
volunteer time, after hours. All of the initial committers are paid by EPAM 
Systems to contribute to this project. However, they are all passionate about 
the project, and we are both confident and hopeful that the project will 
continue even if no salaried developers contribute to the project.

=== Relationships with Other Apache Products ===
As mentioned in the Rationale section, DLab utilizes a number of existing 
Apache projects (Spark, Toree, Zeppelin, et. al.), and we expect that list to 
expand as the community grows and diversifies. Any Apache project in the big 
data, data science, and/or analytics space would be potentially relevant.

=== A Excessive Fascination with the Apache Brand ===
We are applying to the Incubator process because we think it is the next 
logical step for the DLab project after open-sourcing the code. This proposal 
is not for the purpose of generating publicity. Rather, we want to make sure to 
create a very inclusive and meritocratic community, outside the umbrella of a 
single company. EPAM has a long history of contributing to Apache projects and 
the DLab developers and contributors understand the implication of making it an 
Apache project.

== Required Resources ==

=== Mailing lists ===
 * d...@dlab.incubator.apache.org
 * comm...@dlab.incubator.apache.org
 * priv...@dlab.incubator.apache.org

=== Source control ===
 * https://git-wip-us.apache.org/repos/asf/incubator-dlab

=== Issue tracking ===
 * JIRA DLab (DLAB)

== Documentation ==
 * DLab Website: http://dlab.opensource.epam.com
 * DLab code base: https://github.com/epam/DLab
 * DLab Overview: https://github.com/epam/DLab/blob/master/README.md
 * DLab User Guide: https://github.com/epam/DLab/blob/master/USER_GUIDE.md

== Initial Source ==
The DLab codebase is currently hosted on Github: https://github.com/epam/DLab

== Source and Intellectual Property Submission Plan ==
The DLab source code in Github is currently licensed under Apache License v2.0 
and the copyright is assigned to EPAM Systems. If DLab becomes an Incubator 
project at the ASF, EPAM Systems will transfer the source code and trademark 
ownership to the Apache Software Foundation via a Software Grant Agreement.

== External Dependencies ==
To the best of our knowledge, all of DLab dependencies are distributed under 
Apache compatible licenses.

DLab was designed to be highly extensible, and we expect and encourage the 
development of third-party extensions and plug-ins. We also understand that any 
such component, if it requires a dependency forbidden by Apache license policy, 
would not be eligible for inclusion in an Apache release, and would have to be 
hosted, supported, etc. outside of ASF infrastructure and labeled appropriately.

=== External dependencies licensed under Apache License 2.0: ===
MongoDB Java Driver - org.mongodb:mongo-java-driver 
(http://mongodb.github.io/mongo-java-driver/3.2/driver)

Dropwizard (https://github.com/dropwizard/dropwizard)

Dropwizard Template Config 
(https://github.com/tkrille/dropwizard-template-config)

Apache Directory Server (https://github.com/apache/directory-server)

Jackson (https://github.com/FasterXML/jackson)

AWS Java SDK (https://github.com/aws/aws-sdk-java)

Boto3 (https://github.com/boto/boto3)

=== External dependencies licensed under the MIT License: ===
angular2-app (https://www.npmjs.com/package/angular2-app)

angular2-seed (https://www.npmjs.com/package/angular2-seed)

angular2-seed-advanced (https://www.npmjs.org/package/angular2-seed-advanced)

angular2-seed-n3UX (https://www.npmjs.com/package/angular2-seed-n3UX)

http-status-enum (https://www.npmjs.com/package/http-status-enum)
Mockito (https://github.com/mockito/mockito)

ng2-translate (https://www.npmjs.com/package/ng2-translate)

SLF4J (http://www.slf4j.org/)

=== External dependencies licensed under the CDDL License: ===
Jersey (https://github.com/jersey/jersey)

=== External dependencies licensed under the Python Software License Version 2: 
===
jython (https://github.com/jythontools/jython)

=== ASF Projects: ===
Apache Spark, Apache Toree (incubating), Apache Zeppelin

== Cryptography ==
Not applicable.

== Initial Committers ==
 * Dmytro Liaskovskyi dmytro_liaskovs...@epam.com
 * Volodymyr Veres volodymyr_ve...@epam.com
 * Oleh Hrynets oleh_hryn...@epam.com
 * Oleh Hrynyk oleh_hry...@epam.com
 * Oleh Martushevskyi oleh_martushevs...@epam.com
 * Oleh Moskovych oleh_moskov...@epam.com
 * Vadym Kuznetsov vadym_kuznet...@epam.com
 * Usein Faradzhev usein_faradz...@epam.com
 * Bohdan Hliva bohdan_hl...@epam.com
 * Oleksandr Melnychuk oleksandr_melnych...@epam.com
 * Mikhail Teplitskiy mikhail_teplits...@epam.com
 * Vira Vitanska vira_vitan...@epam.com
 * Andriana Kovalyshyn andriana_kovalys...@epam.com
 * Oleksandr Chaparin oleksandr_chapa...@epam.com
 * Denys Shliakhov denys_shliak...@epam.com
 * Nazar Barabash nazar_barab...@epam.com
 * Yuriy Holinko yuriy_holi...@epam.com
 * Petro Kotsiuba petro_kotsi...@epam.com
 * Bogdan Rudyi bogdan_ru...@epam.com
 * Mikhail Teplitskyi mikhail_teplits...@epam.com

== Sponsors ==

=== Champion ===
 * P. Taylor Goetz ptgo...@apache.org

=== Nominated Mentors ===
 * P. Taylor Goetz ptgo...@apache.org

=== Sponsoring Entity ===
 * The Apache Incubator

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to