Gut instinct is no, Spark is overkill for your needs... you should be able
to accomplish all of that with a relational database or a column oriented
database (depending on the types of queries you most frequently run and the
performance requirements).

--
Chris Miller

On Mon, Mar 7, 2016 at 1:17 AM, Laumegui Deaulobi <
guillaume.bilod...@gmail.com> wrote:

> Our problem space is survey analytics.  Each survey comprises a set of
> questions, with each question having a set of possible answers.  Survey
> fill-out tasks are sent to users, who have until a certain date to complete
> it.  Based on these survey fill-outs, reports need to be generated.  Each
> report deals with a subset of the survey fill-outs, and comprises a set of
> data points (average rating for question 1, min/max for question 2, etc.)
>
> We are dealing with rather large data sets - although reading the internet
> we get the impression that everyone is analyzing petabytes of data...
>
> Users: up to 100,000
> Surveys: up to 100,000
> Questions per survey: up to 100
> Possible answers per question: up to 10
> Survey fill-outs / user: up to 10
> Reports: up to 100,000
> Data points per report: up to 100
>
> Data is currently stored in a relational database but a migration to a
> different kind of store is possible.
>
> The naive algorithm for report generation can be summed up as this:
>
> for each report to be generated {
>   for each report data point to be calculated {
>     calculate data point
>     add data point to report
>   }
>   publish report
> }
>
> In order to deal with the upper limits of these values, we will need to
> distribute this algorithm to a compute / data cluster as much as possible.
>
> I've read about frameworks such as Apache Spark but also Hadoop, GridGain,
> HazelCast and several others, and am still confused as to how each of these
> can help us and how they fit together.
>
> Is Spark the right framework for us?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-us-tp26412.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to