[
https://issues.apache.org/jira/browse/BIGTOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251026#comment-14251026
]
jay vyas edited comment on BIGTOP-1271 at 12/18/14 2:19 AM:
------------------------------------------------------------
data set generation is now handled by the datagenerator external library...
https://bintray.com/rnowling/bigpetstore/bigpetstore-data-generator/view this
is obsoleted.
was (Author: jayunit100):
data set generation is no handled by the datagenerator external library...
https://bintray.com/rnowling/bigpetstore/bigpetstore-data-generator/view this
is obsoleted.
> BigPetStore: Embed user "types" into the generated data.
> --------------------------------------------------------
>
> Key: BIGTOP-1271
> URL: https://issues.apache.org/jira/browse/BIGTOP-1271
> Project: Bigtop
> Issue Type: New Feature
> Components: blueprints
> Affects Versions: backlog
> Reporter: jay vyas
>
> The data set generation in BigPetStore results in data with temporal and
> geographic patterns, however, there are no "personal" biases in the data.
> We need to add personal biases into the data so that the Mahout recommender
> is capable of teasing out statistically significant product clusters for
> users.
> A simple implementation:
> {noformat}
> given 2 "types" of customers (i.e. dog people, cat people)
> t = hash (customer_name) % 2
> if(t==0)
> customer buys only dog products
> if(t==1)
> customer buys only cat products
> {noformat}
> This approach will easily scale and consistently embed profiles into each
> persons purchases. Obviously using some OO magic we can create customers who
> also buy cat and dog products both... but the basic approach still remains
> (hash code -> customer type -> product biases).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)