[
https://issues.apache.org/jira/browse/FLINK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060921#comment-17060921
]
Jark Wu commented on FLINK-16627:
---------------------------------
I'm wondering "format.remove-null = true" and 'format.filter-null-values' are
not good enough, because (1) it is not clear where does this property take
effect, reader side or writer sider? (2) it may be misunderstood that it filter
records when it has null values.
I would propose to have a property {{'format.json-include'}}, json-include
comes from jackson's {{JsonInclude}} annotation which is used to "define which
properties of Java Beans are to be included in serialization" (even though it's
still a little implicit for serialization). Jackson is the underlying json
framework used by flink-json.
The availbe property values can be
- "always": default, values are to be always included
- "non-null": only non-null values are to be included.
- and maybe "non-empty" in the future.
What do you think?
> Remove keys with null value in json
> -----------------------------------
>
> Key: FLINK-16627
> URL: https://issues.apache.org/jira/browse/FLINK-16627
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table
> SQL / Planner
> Affects Versions: 1.10.0
> Reporter: jackray wang
> Assignee: jackray wang
> Priority: Major
>
> {code:java}
> //sql
> CREATE TABLE sink_kafka ( subtype STRING , svt STRING ) WITH (……)
> {code}
>
> {code:java}
> //sql
> CREATE TABLE source_kafka ( subtype STRING , svt STRING ) WITH (……)
> {code}
>
> {code:java}
> //scala udf
> class ScalaUpper extends ScalarFunction {
> def eval(str: String) : String= {
> if(str == null){
> return ""
> }else{
> return str
> }
> }
>
> }
> btenv.registerFunction("scala_upper", new ScalaUpper())
> {code}
>
> {code:java}
> //sql
> insert into sink_kafka select subtype, scala_upper(svt) from source_kafka
> {code}
>
>
> ----
> Sometimes the svt's value is null, inert into kafkas json like
> \{"subtype":"qin","svt":null}
> If the amount of data is small, it is acceptable,but we process 10TB of data
> every day, and there may be many nulls in the json, which affects the
> efficiency. If you can add a parameter to remove the null key when defining a
> sinktable, the performance will be greatly improved
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)