[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304577#comment-14304577
 ] 

Corey J. Nolet commented on SPARK-5260:
---------------------------------------

I'm thinking all the schema-specific functions should be pulled out into an 
object called JsonSchemaFunctions. allKeysWithValueTypes() and createSchema() 
functions should be exposed via the public API and commented well based on 
their use. 

For the project I have that's using these functions, I am actually using the 
allKeysWithValueTypes() over my entire RDD as it's being saved to a sequence 
file and I'm using an Accumulator[Set[(String, DataType)]] that is aggregating 
all the schema elements for the RDD into a final Set where I can then store off 
the schema and later call "CreateSchema()" to get the final StructType that can 
be used with the sql table. I had to write a isConflicted(Set[(String, 
DataType)]]) function as well to determine if it's possible that a JSON object 
or JSON array was also encountered as a primitive type in one of the records in 
the RDD or vice versa.

> Expose JsonRDD.allKeysWithValueTypes() in a utility class 
> ----------------------------------------------------------
>
>                 Key: SPARK-5260
>                 URL: https://issues.apache.org/jira/browse/SPARK-5260
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Corey J. Nolet
>            Assignee: Corey J. Nolet
>
> I have found this method extremely useful when implementing my own strategy 
> for inferring a schema from parsed json. For now, I've actually copied the 
> method right out of the JsonRDD class into my own project but I think it 
> would be immensely useful to keep the code in Spark and expose it publicly 
> somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to