Hi, That depends on a lot of things, but as a starting point I would ask whether you are planning to store your data in JSON format?
Regards, Gourav Sengupta On Sun, Mar 6, 2016 at 5:17 PM, Laumegui Deaulobi < guillaume.bilod...@gmail.com> wrote: > Our problem space is survey analytics. Each survey comprises a set of > questions, with each question having a set of possible answers. Survey > fill-out tasks are sent to users, who have until a certain date to complete > it. Based on these survey fill-outs, reports need to be generated. Each > report deals with a subset of the survey fill-outs, and comprises a set of > data points (average rating for question 1, min/max for question 2, etc.) > > We are dealing with rather large data sets - although reading the internet > we get the impression that everyone is analyzing petabytes of data... > > Users: up to 100,000 > Surveys: up to 100,000 > Questions per survey: up to 100 > Possible answers per question: up to 10 > Survey fill-outs / user: up to 10 > Reports: up to 100,000 > Data points per report: up to 100 > > Data is currently stored in a relational database but a migration to a > different kind of store is possible. > > The naive algorithm for report generation can be summed up as this: > > for each report to be generated { > for each report data point to be calculated { > calculate data point > add data point to report > } > publish report > } > > In order to deal with the upper limits of these values, we will need to > distribute this algorithm to a compute / data cluster as much as possible. > > I've read about frameworks such as Apache Spark but also Hadoop, GridGain, > HazelCast and several others, and am still confused as to how each of these > can help us and how they fit together. > > Is Spark the right framework for us? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-us-tp26412.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >