[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957586#comment-15957586
 ] 

Eugene Kirpichov commented on BEAM-1581:
----------------------------------------

I guess I'm suggesting that, like you said, perhaps we shouldn't solve the 
whole design space and call it "JsonIO", but instead conveniently solve a 
really small but really common subset of what people actually do, and provide 
enough building blocks that people can do the rest themselves without too much 
trouble.

E.g. we could:
- do what Spark does, and provide an IO for reading and writing files in format 
"1 JSON object per line", exposing the object say via a Jackson mapper (and 
document the fact that we're using Jackson, and if you don't like it you should 
develop your own transform)
- separately, add IOs for reading and writing 1-string-per-file, so users who 
are dealing with 1 JSON object per file can use that and combine it with their 
favorite way of JSON parsing/mapping
and that's it.

Would there be a downside to doing this?

> JSON source and sink
> --------------------
>
>                 Key: BEAM-1581
>                 URL: https://issues.apache.org/jira/browse/BEAM-1581
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Aviem Zur
>            Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file
> The most common pattern for this is a large object with an array member which 
> holds all the data objects and other members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another pattern used is a file which is simply a JSON array of objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to