William Benton created SPARK-4190:
-------------------------------------
Summary: Allow users to provide transformation rules at JSON ingest
Key: SPARK-4190
URL: https://issues.apache.org/jira/browse/SPARK-4190
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.1.0, 1.2.0
Reporter: William Benton
It would be great if it were possible to provide transformation rules (to be
executed within jsonRDD or jsonFile) so that users could
(1) deal with JSON files that confound schema inference or are otherwise
insufficiently disciplined, or
(2) simply perform arbitrary object transformations at ingest before a
schema is inferred.
json4s, which Spark already uses, has nice interfaces for specifying
transformations as partial functions on objects and accessing nested structures
via path expressions. (We might want to introduce an abstraction atop json4s
for a public API, but the json4s API seems like a good first step.) There are
some examples of these transformations at https://github.com/json4s/json4s and
at http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]