[
https://issues.apache.org/jira/browse/SPARK-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-5443:
-------------------------------
Assignee: Nathan Howell
> jsonRDD with schema should ignore sub-objects that are omitted in schema
> ------------------------------------------------------------------------
>
> Key: SPARK-5443
> URL: https://issues.apache.org/jira/browse/SPARK-5443
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 1.2.0
> Reporter: Derrick Burns
> Assignee: Nathan Howell
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Reading the code for jsonRDD, it appears that all fields of a JSON object are
> read into a ROW independent of the provided schema. I would expect it to be
> more efficient to only store in the ROW those fields that are explicitly
> included in the schema.
> For example, assume that I only wish to extract the "id" field of a tweet.
> If I provided a schema that simply had one field within a map named "id",
> then the row object would only store that field within a map.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]