Hi Chathura, Can there be a requirement to maintain subsets of the initial dataset under > a project? > For example, certain methods of preprocessing or slicing the dataset in > different dimensions could produce multiple subsets of data. Subsequently, > we may want to apply various mining algorithms on these resulting subsets > of data. As computing such subsets can be costly, it may be beneficial to > store precomputed subsets of data under a project.
IMHO, i don't see any requirement of keeping subsets of data. If you look at the ER diagram, by having multiple "Processes" and "Executions" to a single project, this requirement is satisfied. In a process, users can do pre-processing and things like dimensionality reduction + select a training-set.This means a project can have multiple training sets. (The actual pre-processing will NOT be done at this point, because otherwise we have to iterate through a fairly large dataset few times, which is an overhead. Hence only the configurations will be taken as user inputs and the actual pre-processing will be done later as map-reduced jobs.) Then in "Execution", user can use the previous pre-processing configuration and run the model building with a desired algorithm. This way, multiple algorithms can be applied on the same training-set. Apart from that, here we may talking about Gigabytes or even Terabytes of data. Thus keeping subsets is not the best thing in anyway. Another example is that once a clustering algorithm is applied, users may > want to preserve data in certain clusters for further processing. In that > case, users can select one or more clusters and generate subsets of data. WSO2 ML will only produce the models, and would NOT facilitate prediction using built models. A built model will be published and will be applied by CEP/BAM/ESB. Are we planning to add visualizations to the results produced by mining > algorithms. For example, we can mark data points belonging to different > clusters by different colors/shapes. Such visualizations can also assist in > selecting better algorithms or tuning parameters. +1 for this. Even though ML is not used for prediction, this can be still applied for the visualization of the model validation results. :) Regards, Supun On Wed, Oct 29, 2014 at 6:54 PM, Chathura Ekanayake <[email protected]> wrote: > Can there be a requirement to maintain subsets of the initial dataset > under a project? > For example, certain methods of preprocessing or slicing the dataset in > different dimensions could produce multiple subsets of data. Subsequently, > we may want to apply various mining algorithms on these resulting subsets > of data. As computing such subsets can be costly, it may be beneficial to > store precomputed subsets of data under a project. Another example is that > once a clustering algorithm is applied, users may want to preserve data in > certain clusters for further processing. In that case, users can select one > or more clusters and generate subsets of data. > > Are we planning to add visualizations to the results produced by mining > algorithms. For example, we can mark data points belonging to different > clusters by different colors/shapes. Such visualizations can also assist in > selecting better algorithms or tuning parameters. > > Regards, > Chathura > > > On Wed, Oct 29, 2014 at 11:54 AM, Supun Sethunga <[email protected]> wrote: > >> Hi, >> >> Follow are the features of the "project" concept of WSO2 ML. >> >> - Users start working on a data-set by creating a project, and >> importing a data-set to the project. >> - Project can have only one data-set. >> - User can built several models using different configurations >> (different algorithms, hyperparameter-values, different pre-processing >> options and etc.) for a single data-set. This means that a single project >> may contain several models. >> - Multiple users can work on the same project, and built models on >> the same data-set. >> - Models built within a project can be compared in-terms of >> performance/accuracy. (*Yet to be implemented*) >> - Users can be added/removed to/from projects. >> >> Once users logged-in, they can see the list of projects they are working >> on/ assigned to. Projects can be opened or deleted. Upon deletion, the >> data-set associated with the project and all the models that have been >> created under that project will be removed. >> >> Appreciate any comments/feedback. Please find the ER-Diagram and >> screen-shots in the attachments. >> >> Regards, >> Supun >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> lean | enterprise | middleware >> Mobile : +94 716546324 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Supun Sethunga* Software Engineer WSO2, Inc. lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
