[ https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957586#comment-15957586 ]
Eugene Kirpichov commented on BEAM-1581: ---------------------------------------- I guess I'm suggesting that, like you said, perhaps we shouldn't solve the whole design space and call it "JsonIO", but instead conveniently solve a really small but really common subset of what people actually do, and provide enough building blocks that people can do the rest themselves without too much trouble. E.g. we could: - do what Spark does, and provide an IO for reading and writing files in format "1 JSON object per line", exposing the object say via a Jackson mapper (and document the fact that we're using Jackson, and if you don't like it you should develop your own transform) - separately, add IOs for reading and writing 1-string-per-file, so users who are dealing with 1 JSON object per file can use that and combine it with their favorite way of JSON parsing/mapping and that's it. Would there be a downside to doing this? > JSON source and sink > -------------------- > > Key: BEAM-1581 > URL: https://issues.apache.org/jira/browse/BEAM-1581 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions > Reporter: Aviem Zur > Assignee: Aviem Zur > > JSON source and sink to read/write JSON files. > Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} > which are a {{FileBaseSource}}/{{FileBasedSink}}. > Consider using methods/code (or refactor these) found in {{AsJsons}} and > {{ParseJsons}} > The {{PCollection}} of objects the user passes to the transform should be > embedded in a valid JSON file > The most common pattern for this is a large object with an array member which > holds all the data objects and other members for metadata. > Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/ > Another pattern used is a file which is simply a JSON array of objects. -- This message was sent by Atlassian JIRA (v6.3.15#6346)