Hi All,

I’m opening up a DISCUSS thread to propose a new direction for our DISTILL 
product, which is presently deprecated and unmaintained. Please feel free to 
offer different proposals, constructive criticism, and opinions.

About Apache DISTILL: http://flagon.incubator.apache.org/distill/ 
<http://flagon.incubator.apache.org/distill/>

Context: 

UserALE.js is and will likely always be our ‘flagship” product. Its the 
enabling technology that makes behavioral logging easy to deploy, control, and 
useful for business and analytical use-cases. Distill, however, was conceived 
of the analytical framework in which we can really show off what UserALE can 
do. In Distill, we thought to add the high value content that would really 
discriminate us in the market and drive both demand and community growth. 

Distill 0.1.0 was envisioned as a stack component that would house analytical 
libraries for users to call from their own python environments, or from a 
front-end visualization client (e.g., TAP).  

Due to development priorities in the original SensSoft (now Flagon) team, 
Distill took a backseat to instrumentation work and front-end work. In its 
current state, it is strongly tied to TAP, which is also deprecated. 

Changes in the original SensSoft team make it unlikely that we will be able to 
revive Distill in its original product vision (although we have very good 
requirements for Distill v0.2.0: 
https://cwiki.apache.org/confluence/display/FLAGON/Distill+0.2.0 
<https://cwiki.apache.org/confluence/display/FLAGON/Distill+0.2.0>). However, 
were we do revive Distill, as is, we might be repeating the same 
mistakes—rather than focus on the analytical content that will drive adoption 
and community growth, we will be distracted to working on infrastructure to 
pull that content into a multi-purpose stack component.

Proposal:

My proposal is to refactor Distill to expedite analytical content development. 
This involves pulling back the product focus from a server-side stack component 
to a Python package that users can pull into their own Python and ‘Conda 
package. We can then provide some of the basic functions (e.g., make elastic 
queries, aggregations) Distill offered through third-party dependencies (e.g., 
elasticsearch-dsl), and quickly begin generating analytical libraries to 
distribute through the package. 

Depending on demand, we can revisit the concept of a server-side stack 
component serving various front-ends, but capitalize on other streamlined 
visualization packages like PlotLy for visualization.

This will require major repository restructuring. However, this lessons the 
work needed to get to analytical content generation and dissemination and 
expedites community growth. 


Please share your thoughts. As we reach consensus, we can move to a VOTE to 
steer toward this proposal or others that arise through Discussion.

Thanks,

Josh

Reply via email to