"Schemaless" Spark

Efe Selcuk Fri, 19 Aug 2016 14:55:04 -0700

Hi Spark community,

This is a bit of a high level question as frankly I'm not well versed in
Spark or related tech.


We have a system in place that reads columnar data in through CSV and
represents the data in relational tables as it operates. It's essentially
schema-based ETL. This restricts our input data so we either have to
restrict what the data looks like coming in, or we have to transform and
map it to some relational representation before we work on it.

One of our goals with the Spark application we're building is to make our
input and operations more generic. So we can accept data in say JSON
format, operate on it without a schema, and output that way as well.

My question is on whether Spark supports this view and what facilities it
provides. Unless I've been interpreting things incorrectly, the various
data formats that spark operates on still assumes specified fields. I don't
know what this approach would look like in terms of data types, operations,
etc.

I realize that this is lacking in detail but I imagine this may be more of
a conversation than just an answer to a question.

Efe

"Schemaless" Spark

Reply via email to