+1 good luck
> On Oct 17, 2017, at 2:14 PM, Tom Barber <t...@spicule.co.uk> wrote: > > +1 > > Happy to help out if the vote passes. > > On Tue, Oct 17, 2017 at 10:07 PM, Madhawa Kasun Gunasekara < > madhaw...@gmail.com> wrote: > >> Here is my +1 >> >> Thanks, >> Madhawa >> >> Madhawa >> >> On Tue, Oct 17, 2017 at 4:04 PM, lewis john mcgibbney <lewi...@apache.org> >> wrote: >> >>> Hi Folks, >>> Having secured a mentorship team consisting of the following IPMC >> Members, >>> I am happy to open a formal VOTE thread on accepting the Science Data >>> Analytics Platform (SDAP) into Apache Incubator. >>> >>> - Lewis John McGibbney (lewi...@apache.org) >>> - Raphael Bircher (bircher at apace dot org) >>> - Suneel Marthi (smarthi at apache dot org) >>> >>> Thank you to both Raphael and Suneel for coming forward. :) >>> The VOTE will be open for at least 72 hours. >>> >>> [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache >> Incubator >>> [ ] +/-0 ... just because >>> [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache >>> Incubator... because >>> >>> Thanks in advance to all participants. >>> Lewis >>> >>> P.S. Here is a binding +1 from me >>> >>> On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney < >> lewi...@apache.org >>>> >>> wrote: >>> >>>> Hi Folks, >>>> I would like to open a DISCUSS thread on the topic of accepting the >>>> Science Data Analytics Platform (SDAP) <https://wiki.apache.org/ >>>> incubator/SDAPProposal> Project into the Incubator. >>>> I am CC'ing Thomas Huang from NASA JPL who I have been working with to >>>> build community around a kick-ass set of software projects under the >> SDAP >>>> umbrella. >>>> At this stage we would very much appreciate critical feedback from >>> general@ >>>> community. We are also open to mentors who may have an interest in the >>>> project proposal. >>>> The proposal is pasted below. >>>> Thanks in advance, >>>> Lewis >>>> >>>> = Abstract = >>>> The Science Data Analytics Platform (SDAP) establishes an integrated >> data >>>> analytic center for Big Science problems. It focuses on technology >>>> integration, advancement and maturity. >>>> >>>> = Proposal = >>>> SDAP currently represents a collaboration between NASA Jet Propulsion >>>> Laboratory (JPL), Florida State University (FSU), the National Center >> for >>>> Atmospheric Research (NCAR), and George Mason University (GMU). SDAP >>> brings >>>> together a number of big data technologies including a NASA funded >>>> OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data >>>> analytic platform), DOMS (Distributed in-situ to satellite matchup), >>> MUDROD >>>> (Search relevancy and discovery) and VQSS (Virtualized Quality >> Screening >>>> Service) under a single umbrella. Within the original Incubator >> proposal, >>>> VQSS will not be included however it is anticipated that a future >> source >>>> code donation will cover VQSS. >>>> >>>> = Background and Rationale = >>>> SDAP is a technology software solution currently geared to better >> enable >>>> scientists involved in advancing the study of the Earth's physical >>>> oceanography. With increasing global temperature, warming of the ocean, >>> and >>>> melting ice sheets and glaciers, the impacts can be observed from >> changes >>>> in anomalous ocean temperature and circulation patterns, to increasing >>>> extreme weather events and stronger/more frequent hurricanes, sea level >>>> rise and storm surges affecting coastlines, and may involve drastic >>> changes >>>> and shifts in marine ecosystems. Ocean science communities are relying >> on >>>> data distributed through data centers such as the JPL's Physical >>>> Oceanographic Data Active Archive Center (PO.DAAC) to conduct their >>>> research. In typical investigations, oceanographers follow a >> traditional >>>> workflow for using datasets: search, evaluate, download, and apply >> tools >>>> and algorithms to look for trends. While this workflow has been working >>>> very well historically for the oceanographic community, it cannot scale >>> if >>>> the research involves massive amount of data. NASA's Surface Water and >>>> Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, >> is >>>> expected to generate over 20PB data for a nominal 3-year mission. This >>> will >>>> challenge all existing NASA Earth Science data archival/distribution >>>> paradigms. It will no longer be feasible for Earth scientists to >> download >>>> and analyze such volumes of data. SDAP was therefore developed >> primarily >>> as >>>> a Web-service platform for big ocean data science at the PO.DAAC with >>> open >>>> source solutions used to enable fast analysis of oceanographic data. >> SDAP >>>> has been developed collaboratively between JPL, FSU, NCAR, and GMU and >> is >>>> rapidly maturing to become the generic platform for the next generation >>> of >>>> big science data solutions. The platform is an orchestration of several >>>> previously funded NASA big ocean data solutions using cloud technology, >>>> which include data analysis (NEXUS), anomaly detection (OceanXtremes), >>>> matchup (DOMS), subsetting, discovery (MUDROD), and visualization >> (VQSS). >>>> SDAP will enable web-accessible, fast data analysis directly on huge >>>> scientific data archives to minimize data movement and provide access, >>>> including subset, only to the relevant data. >>>> >>>> = Science Data Analytics Platform Project Overview = >>>> SDAP consists of several loosely coupled, independently functioning >>>> sub-projects. The graphic below displays an overview of how these >>>> sub-projects fuse together. N.B., although the graphic uses terminology >>>> relating to OceanWorks, essentially the SDAP architecture is identical. >>>> >>>> {{attachment:sdap.png}} >>>> >>>> == OceanXtremes == >>>> Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An >>>> application that allows you to view imagery and perform analysis on sea >>>> level rise data. >>>> >>>> '''Objective''' >>>> Develop an anomaly detection system which identifies items, events or >>>> observations which do not conform to an expected pattern. >>>> * Mature and test domain-specific, multi-scale anomaly and feature >>>> detection algorithms. >>>> * Identify unexpected correlations between key measured variables. >>>> >>>> Demonstrate value of technologies in this service: >>>> * Adapted Map-Reduce data mining. >>>> * Algorithm profiling service. >>>> * Shared discovery and exploration search tools. >>>> * Automatic notification of events of interest. >>>> >>>> == NEXUS == >>>> NEXUS is an emerging technology developed at JPL >>>> * A Cloud-based/Cluster-based data platform that performs scalable >>>> handling of observational parameters analysis designed to scale >>> horizontally >>>> * Leveraging high-performance indexed, temporal, and geospatial search >>>> solution >>>> * Breaks data products into small chunks and stores them in a >>> Cloud-based >>>> data store >>>> >>>> ''Data Volumes Exploding'' >>>> * SWOT mission is coming >>>> * File I/O is slow >>>> >>>> ''Scalable Store & Compute is Available'' >>>> * NoSQL cluster databases >>>> * Parallel compute, in-memory map-reduce >>>> * Bring Compute to Highly-Accessible Data (using Hybrid Cloud) >>>> >>>> ''Pre-Chunk and Summarize Key Variables'' >>>> * Easy statistics instantly (milliseconds) >>>> * Harder statistics on-demand (in seconds) >>>> * Visualize original data (layers) on a map quickly >>>> >>>> == DOMS == >>>> The Distributed Oceanographic Match-Up Service >>>> DOMS is designed to reconcile satellite and in situ datasets in support >>> of >>>> NASA's Earth Science mission. The service will provide a mechanism for >>>> users to input a series of geospatial references for satellite >>> observations >>>> and receive the in situ observations that are matched to the satellite >>> data >>>> within a selectable temporal and spatial domain. DOMS includes several >>>> characteristic in situ and satellite observation datasets - with an >>> initial >>>> focus on salinity, sea temperature, and winds. DOMS will be used by the >>>> marine and satellite research communities to support a range of >>> activities >>>> and several use cases will be described. The service is designed to >>> provide >>>> a community-accessible tool that dynamically delivers matched data and >>>> allows the scientist to only work with the subset of data where the >>> matches >>>> exist. >>>> >>>> == MUDROD == >>>> Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to >>>> Improve Data Discovery and Access >>>> Data discovery accuracy is a challenging topic for both Earth science >> and >>>> other domains. It is especially true for scientific data sets that are >>> not >>>> as popular as Amazon or Google data. MUDROD is focused on mining >> oceanic >>>> knowledge from the PO.DAAC user log files to improve the end user data >>>> discovery experience at PO.DAAC. There are three steps in the research: >>> a) >>>> the oceanographic semantics were extracted from three resources of >> SWEET, >>>> GCMD ontology, and the keywords used by end users for searching PO.DAAC >>>> datasets, b) mining the linkage among different vocabularies based on >>> user >>>> data discvoery sessions, and c) build the linkage among vocabularies >>> based >>>> on a comprehensive approach by considering domain de facto standard, >>> e.g., >>>> SWEET and GCMD, and the knowledge mined from the log files. The >> semantics >>>> is used to improve data discovery for ranking results, navigating among >>>> vocabularies, and recommending data based on user searchers. >>>> >>>> = Current Status = >>>> All components of SDAP were originally designed and developed under >>> grants >>>> from the NASA-funded Advanced Information Systems and Technologies >> (AIST) >>>> program. The initiative to bring them the components together under the >>>> SDAP umbrella was granted through an AIST-funded follow-on grant which >>> will >>>> run for another ~18 or so months. >>>> Currently no projects have made official releases so outside of >> community >>>> building, this will be our primary Incubating goal. All SDAP source >> code >>> is >>>> currently publicly available and licensed under the ALv2.0. >>>> >>>> = Meritocracy = >>>> The current developers are familiar with meritocratic open source >>>> development at Apache. The SDAP team consumes Apache products heavily >>> with >>>> members being part of several Apache user communities. SDAP itself has >>>> critical dependencies upon Apache products. Lewis McGibbney (JPL >>> employee), >>>> a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika, >> OODT, >>>> OCW, etc., is championing the effort to bring SDAP into and through the >>>> Apache Incubator and has been evangelizing the Apache Way to the >> current >>>> SDAP contributors such that the meritocratic process is well understood >>> and >>>> followed. Apache was chosen specifically because we want to encourage >>> this >>>> style of community development for the project and for it to sustain >> SDAP >>>> forward to become the generic platform for the next generation of big >>>> science data solutions >>>> >>>> = Community = >>>> The SDAP project is a fairly new effort and our community is not yet >>>> fully/firmly established. Initial committers comprising the SDAP roster >>>> have only recently fully come together as a unified team however there >>> is a >>>> large degree of synergy between constituent members at JPL, FSU, NCAR, >>> and >>>> GMU. Therefore, community building and publicity continues to be a >> major >>>> thrust. With the activity and exposure regularly attained by several >>>> community members, we hope to grow the SDAP presence in and across >>> several >>>> (scientific) forums. The SDAP technology is generating interest within >>>> communities such as the Earth Science Information Partnership (ESIP), >>>> American Geophysical Union (AGU) and plethora or science meetings >> around >>>> the globe. This in effect, we hope, will further contribute towards the >>>> possibility of SDAP being used across Government Agencies such as NASA, >>>> NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in >>>> academic institutions around the globe. >>>> During incubation, we will explicitly seek to increase our adoption, >> with >>>> SDAP already being featured on the agenda for several high profile >>> globally >>>> significant scientific conferences and meetings. >>>> >>>> = Core Developers = >>>> The current set of core developers is relatively small, including >>>> full-time and students from across JPL, FSU, NCAR, and GMU. Initial >>>> community management and participation will be distributed across the >>>> entire team, most of which have been involved with the constituent >>> projects >>>> for <2 years. >>>> >>>> = Alignment = >>>> All SDAP code is licensed under Apache v2.0. >>>> >>>> = Known Risks = >>>> >>>> == Orphaned products == >>>> There are currently no orphaned products. Each component of SDAP has >>>> dedicated personnel leading and participating in its ongoing >> development. >>>> Additionally, there is substantial collaboration between projects >>>> facilitated by regular project meetings which are specific the the >>> initial >>>> member entities and focused on advancing physical oceanographic >> science. >>>> >>>> == Inexperience with Open Source == >>>> JPL (in particular Lewis McGibbney) has been part of several efforts to >>>> transition to and grow projects communities at Apache e.g. Apache OODT, >>>> Apache Open Climate Workbench, Apache Joshua (Incubating), Apache >>> SensSoft >>>> (Incubating), Apache DRAT (Incubating). Most of the code developed >> under >>>> the SDAP umbrella was and is open source prior to the Incubator effort >> so >>>> we are well familiarized with the nuances of open source software. >>>> >>>> = Relationships with Other Apache Products = >>>> SDAP has strong dependency upon a number of high profile and smaller >>>> profile Apache products. Examples can be seen in the breakdown of >>> External >>>> Dependencies. As we continue to grow SDAP within the Incubator, we will >>>> make efforts to share community stories, software advancements and >>> possible >>>> improvements in our use of our Apache dependencies back to those >> project >>>> communities. >>>> >>>> = Developers = >>>> The SDAP project and hence developers is currently funded through a >> NASA >>>> AIST follow-on grant with funding secured for the next ~18 months. >> There >>>> are currently no 100% time dedicated developers, however, the same core >>>> team that does work currently will continue to work on the project >>>> throughout the next current funding period and after. There is >> currently >>> no >>>> business strategy aligned with SDAP however it is perceived that >> future, >>>> yet unsecured funding may by directed to further feature advancement >> and >>>> project evangelism. >>>> >>>> = Documentation = >>>> Documentation is currently available in a number of locations e.g. >> Github >>>> wiki, Github pages, etc. with each repository under the oceanworks-aist >>>> Github Org maintaining documentation available through wiki’s attached >> to >>>> the repositories. Additionally, most of the SDAP sub-projects have been >>>> extensively documented within plethora of formal academic publications >>>> across several academic communities. It would be our intention, >> certainly >>>> atleast to unify the Github wiki ad Github pages documentation most >>> likely >>>> to make up the sdap.apache.org Website content. >>>> >>>> = Initial Source = >>>> Current source resides in several locations Github: >>>> * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS) >>>> * https://github.com/dataplumber/edge (EDGE) >>>> * https://github.com/aist-oceanworks/mudrod (MUDROD) >>>> * https://bitbucket.org/coaps_mdc/doms/src (DOMS) >>>> >>>> = External Dependencies = >>>> Each component of the Science Data Analytics Platform has its own >>>> dependencies. Documentation will be available for integrating them. >>>> >>>> == MUDROD == >>>> '''Core''' >>>> com.google.code.gson gson 2.5 compile >>>> jar false >>>> org.jdom jdom 2.0.2 compile >>>> jar false >>>> org.elasticsearch elasticsearch 5.2.0 compile >>>> jar false >>>> org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile >>>> jar false >>>> joda-time joda-time 2.9.4 compile >>>> jar false >>>> com.carrotsearch hppc 0.7.1 compile >>>> jar false >>>> org.apache.spark spark-core_2.11 2.1.0 compile >>>> jar false >>>> org.apache.spark spark-sql_2.11 2.1.0 compile >>>> jar false >>>> org.apache.spark spark-mllib_2.11 2.1.0 compile >>>> jar false >>>> org.scala-lang scala-library 2.11.8 compile >>>> jar false >>>> org.codehaus.jettison jettison 1.3.8 compile >>>> jar false >>>> commons-cli commons-cli 1.2 compile >>>> jar false >>>> net.sf.opencsv opencsv 2.3 compile >>>> jar false >>>> org.apache.jena jena-core 3.3.0 compile >>>> jar false >>>> junit junit 4.12 test >>>> jar false >>>> >>>> '''Service''' >>>> gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile >>>> jar false >>>> javax.servlet javax.servlet-api 3.1.0 provided >>>> jar false >>>> com.google.code.gson gson 2.5 compile >>>> jar false >>>> >>>> '''Web''' >>>> * AngularJS - MIT License >>>> * BootstrapJS - MIT License >>>> * jQueryJS - MIT License >>>> * Underscore JS - MIT License >>>> >>>> == DOMS == >>>> * Apache Solr version 5.5.1http://lucene.apache.org/solr/ >>>> * EDGE https://github.com/dataplumber/edge >>>> * NetCDF4 http://unidata.github.io/netcdf4-python/ >>>> * Python 3.5 (NOTE: only partial support for py2.7) >>>> >>>> Non stdlib Python dependencies: >>>> * Jinja2==2.9.5 >>>> * python-dateutil==2.6.0 >>>> * cython==0.25.2 >>>> * numpy==1.12.0 >>>> * scipy==0.18.1 >>>> * netCDF4==1.2.7 >>>> * solrpy3 >>>> * siphon==0.4.0 >>>> * neo4j-driver==1.1.0 >>>> * matplotlib==2.0.0 >>>> * requests==2.13.0 >>>> * shapely==1.5.17 >>>> * flask==0.12 >>>> * networkx==1.11 >>>> * pyproj==1.9.5.1 >>>> * blist==1.3.6 >>>> >>>> == NEXUS == >>>> '''Analysis''' >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> analysis/package-list.txt >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> analysis/requirements.txt >>>> >>>> '''Client''' >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> client/requirements.txt >>>> >>>> '''Climatology''' >>>> * matplotlib >>>> * numpy >>>> * netCDF4 >>>> * pathos (https://pypi.python.org/pypi/pathos) >>>> >>>> '''Data-access''' >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> data-access/requirements.txt >>>> >>>> '''Nexus-ingest''' >>>> ''Dataset-tiler'' >>>> * https://github.com/dataplumber/nexus/tree/master/ >>>> nexus-ingest/dataset-tiler/build/reports >>>> >>>> ''developer-box'' >>>> * Just a collection of scripts/vagrant file used to stand up a >> developer >>>> instance of nexus ingestion. No dependencies to report >>>> >>>> ''Groovy-scripts'' >>>> * Collection of Groovy scripts that can be used as part of data >>>> ingestion. They only rely on the standard Groovy library and the >>>> ‘nexus-messages’ project >>>> >>>> ''Nexus-messages'' >>>> * https://github.com/dataplumber/nexus/tree/master/ >>>> nexus-ingest/nexus-messages/build/reports >>>> >>>> ''nexus-sink'' >>>> * https://github.com/dataplumber/nexus/tree/master/ >>>> nexus-ingest/nexus-sink/build/reports >>>> >>>> ''nexus-xd-python-modules'' >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> nexus-ingest/nexus-xd-python-modules/package-list.txt >>>> * https://github.com/dataplumber/nexus/blob/master/ >>>> nexus-ingest/nexus-xd-python-modules/requirements.txt >>>> >>>> ''spring-xd-python'' >>>> * only python standard libraries are used >>>> >>>> ''tcp-shell'' >>>> * https://github.com/dataplumber/nexus/tree/master/ >>>> nexus-ingest/tcp-shell/build/reports >>>> >>>> '''tools/deletebyquery''' >>>> * https://github.com/dataplumber/nexus/blob/master/ >> tools/deletebyquery/ >>>> requirements.txt >>>> >>>> = Required Resources = >>>> Mailing Lists >>>> * priv...@sdap.incubator.apache.org >>>> * d...@sdap.incubator.apache.org >>>> * comm...@sdap.incubator.apache.org >>>> >>>> Git Repos >>>> * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git >>>> * https://git-wip-us.apache.org/repos/asf/incubator-doms.git >>>> * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git >>>> >>>> Issue Tracking >>>> * JIRA Science Data Analytics Platform (SDAP) >>>> >>>> Continuous Integration >>>> * Jenkins builds on https://builds.apache.org/ >>>> >>>> Web >>>> * http://sdap.incubator.apache.org/ >>>> * wiki at http://cwiki.apache.org >>>> >>>> = Initial Committers = >>>> The following is a list of the planned initial Apache committers (the >>>> active subset of the committers for the current repository on Github). >>>> * Lewis John McGibbney (lewi...@apache.org) >>>> * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) >>>> * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) >>>> * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) >>>> * Frank Greguska (gregu...@jpl.nasa.gov) >>>> * Brian Wilson (brian.wil...@jpl.nasa.gov) >>>> * Chaowe Phil Yang (cya...@gmu.edu) >>>> * Yongyao Jiang (yjia...@gmu.edu) >>>> * Yun Li (yl...@gmu.edu) >>>> * Shawn R. Smith (sm...@coaps.fsu.edu) >>>> * Jocelyn Elya (je...@coaps.fsu.edu) >>>> * Mark Bourassa (boura...@coaps.fsu.edu) >>>> * Thomas Cram (tc...@ucar.edu) >>>> * Thomas Huang (thomas.hu...@jpl.nasa.gov) >>>> * Steven Worley (wor...@ucar.edu) >>>> * Zaihua Ji (z...@ucar.edu) >>>> >>>> = Affiliations = >>>> NASA JPL >>>> * Lewis John McGibbney (lewi...@apache.org) >>>> * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) >>>> * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) >>>> * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) >>>> * Frank Greguska (gregu...@jpl.nasa.gov) >>>> * Thomas Huang (thomas.hu...@jpl.nasa.gov) >>>> * Brian Wilson (brian.wil...@jpl.nasa.gov) >>>> >>>> George Mason University >>>> * Chaowe Phil Yang (cya...@gmu.edu) >>>> * Yongyao Jiang (yjia...@gmu.edu) >>>> * Yun Li (yl...@gmu.edu) >>>> >>>> Center for Ocean-Atmospheric Prediction Studies, Florida State >> University >>>> * Shawn R. Smith (sm...@coaps.fsu.edu) >>>> * Jocelyn Elya (je...@coaps.fsu.edu) >>>> * Mark Bourassa (boura...@coaps.fsu.edu) >>>> >>>> Computational Information Systems Laboratory (CISL) / National Center >> for >>>> Atmospheric Research (NCAR) >>>> * Thomas Cram (tc...@ucar.edu) >>>> * Zaihua Ji (z...@ucar.edu) >>>> * Steven Worley (wor...@ucar.edu) >>>> >>>> = Sponsors = >>>> >>>> = Champion = >>>> * Lewis McGibbney (NASA/JPL) >>>> >>>> = Nominated Mentors = >>>> * TBD >>>> * TBD >>>> * TBD >>>> >>>> = Sponsoring Entity = >>>> The Apache Incubator >>>> >>>> >>>> -- >>>> http://home.apache.org/~lewismc/ >>>> @hectorMcSpector >>>> http://www.linkedin.com/in/lmcgibbney >>>> >>> >>> >>> >>> -- >>> http://home.apache.org/~lewismc/ >>> @hectorMcSpector >>> http://www.linkedin.com/in/lmcgibbney >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org