Hi Chathura,

Can there be a requirement to maintain subsets of the initial dataset under
> a project?
> For example, certain methods of preprocessing or slicing the dataset in
> different dimensions  could produce multiple subsets of data. Subsequently,
> we may want to apply various mining algorithms on these resulting subsets
> of data. As computing such subsets can be costly, it may be beneficial to
> store precomputed subsets of data under a project.


IMHO, i don't see any requirement of keeping subsets of data. If you look
at the ER diagram, by having multiple "Processes" and "Executions" to a
single project, this requirement is satisfied.  In a process, users can do
pre-processing and things like dimensionality reduction +  select a
training-set.This means a project can have multiple training sets. (The
actual pre-processing will  NOT be done at this point, because otherwise we
have to iterate through a fairly large dataset few times, which is an
overhead. Hence only the configurations will be taken as user inputs and
the actual pre-processing will be done later as map-reduced jobs.)

Then in "Execution", user can use the previous pre-processing configuration
and run the model building with a desired algorithm. This way, multiple
algorithms can be applied on the same training-set.

Apart from that, here we may talking about Gigabytes or even Terabytes of
data. Thus keeping subsets is not the best thing in anyway.

 Another example is that once a clustering algorithm is applied, users may
> want to preserve data in certain clusters for further processing. In that
> case, users can select one or more clusters and generate subsets of data.


WSO2 ML will only produce the models, and would NOT facilitate prediction
using built models. A built model will be published and will be applied by
CEP/BAM/ESB.

Are we planning to add visualizations to the results produced by mining
> algorithms. For example, we can mark data points belonging to different
> clusters by different colors/shapes. Such visualizations can also assist in
> selecting better algorithms or tuning parameters.


+1 for this. Even though ML is not used for prediction, this can be still
applied for the visualization of the model validation results. :)

Regards,
Supun

On Wed, Oct 29, 2014 at 6:54 PM, Chathura Ekanayake <[email protected]>
wrote:

> Can there be a requirement to maintain subsets of the initial dataset
> under a project?
> For example, certain methods of preprocessing or slicing the dataset in
> different dimensions  could produce multiple subsets of data. Subsequently,
> we may want to apply various mining algorithms on these resulting subsets
> of data. As computing such subsets can be costly, it may be beneficial to
> store precomputed subsets of data under a project. Another example is that
> once a clustering algorithm is applied, users may want to preserve data in
> certain clusters for further processing. In that case, users can select one
> or more clusters and generate subsets of data.
>
> Are we planning to add visualizations to the results produced by mining
> algorithms. For example, we can mark data points belonging to different
> clusters by different colors/shapes. Such visualizations can also assist in
> selecting better algorithms or tuning parameters.
>
> Regards,
> Chathura
>
>
> On Wed, Oct 29, 2014 at 11:54 AM, Supun Sethunga <[email protected]> wrote:
>
>> Hi,
>>
>> Follow are the features of the "project" concept of WSO2 ML.
>>
>>    - Users start working on a data-set by creating a project, and
>>    importing a data-set to the project.
>>    - Project can have only one data-set.
>>    - User can built several models using different configurations
>>    (different algorithms, hyperparameter-values, different pre-processing
>>    options and etc.) for a single data-set. This means that a single project
>>    may contain several models.
>>    - Multiple users can work on the same project, and built models on
>>    the same data-set.
>>    - Models built within a project can be compared in-terms of
>>    performance/accuracy. (*Yet to be implemented*)
>>    - Users can be added/removed to/from projects.
>>
>> Once users logged-in, they can see the list of projects they are working
>> on/ assigned to. Projects can be opened or deleted. Upon deletion, the
>> data-set associated with the project and all the models that have been
>> created under that project will be removed.
>>
>> Appreciate any comments/feedback. Please find the ER-Diagram and
>> screen-shots in the attachments.
>>
>> Regards,
>> Supun
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to