[
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926687#comment-15926687
]
Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:28 PM:
----------------------------------------------------------
I think we should avoid exposing a contract to the user which promises writing
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an
attempt to consume the data in another process (which may belong to a different
user as JSON is often used for integration).
We definitely need concrete {{JsonSink extends FileBasedSink<String>}} and
{{JsonSource extends FileBasedSource<String>}} classes. But these should not
be used directly by the user. All common JSON file logic regarding how the file
should be constructed (As [~jkff] mentioned this should be better defined) will
be in these sink and source, including all file writing/reading related code
(Inherited from {{FileBasedSink}} and {{FileBasedSource}}).
In order to avoid exposing classes which deal with Strings to the user we need
concrete {{PTransform}} classes which deal with objects.
The problem is these probably can't exist in a {{JsonIO}} class since it cannot
have the transformations from objects to JSON Strings (since there are several
ways to implement this).
Should these transforms be in a separate class such as {{JacksonIO}}?
was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an
attempt to consume the data in another process (which may belong to a different
user as JSON is often used for integration).
We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But
these should not be used directly by the user. All common JSON file logic
regarding how the file should be constructed (As [~jkff] mentioned this should
be better defined) will be in these sink and source, including all file
writing/reading related code (Inherited from {{FileBasedSink}} and
{{FileBasedSource}}).
In order to avoid exposing classes which deal with Strings to the user we need
concrete {{PTransform}} classes which deal with objects.
The problem is these probably can't exist in a {{JsonIO}} class since it cannot
have the transformations from objects to JSON Strings (since there are several
ways to implement this).
Should these transforms be in a separate class such as {{JacksonIO}}?
> JSON sources and sinks
> ----------------------
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-extensions
> Reporter: Aviem Zur
> Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}}
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and
> {{ParseJsons}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)