I would start with using DataFrames and the Row
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row>
API, because you can fetch fields by index. Presumably, you'll parse the
incoming data and determine what fields have what types, etc. Or, will
someone specify the schema dynamically some how?

Either way, once you know the types and indices of the fields you need for
a given query, you can fetch them using the Row methods.

HTH,

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Lightbend <http://lightbend.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Thu, Apr 28, 2016 at 11:34 AM, _na <nikhila.alb...@seeq.com> wrote:

> We are looking to incorporate Spark into a timeseries data investigation
> application, but we are having a hard time transforming our workflow into
> the required transformations-on-data model. The crux of the problem is that
> we don’t know a priori which data will be required for our transformations.
>
> For example, a common request might be `average($series2.within($ranges))`,
> where in order to fetch the right sections of data from $series2, $ranges
> will need to be computed first and then used to define data boundaries.
>
> Is there a way to get around the need to define data first in Spark?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Using-Spark-when-data-definitions-are-unknowable-at-compile-time-tp17371.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to