[
https://issues.apache.org/jira/browse/FLINK-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235531#comment-17235531
]
Suhan Mao edited comment on FLINK-18202 at 11/19/20, 3:12 PM:
--------------------------------------------------------------
[~twalthr] [~libenchao]
I have finished the first version of flink-pb. the code is located in
[https://github.com/maosuhan/flink-pb] for temporary use. You can go through
the code with the help of README. You can review the design and the code and I
will be very happy to hear your advices.
Features:
# Codegen in deserialization. The performance is close to java native API.
# Fast serialization. The performance is 3 times faster than proto builder
API.(setter -> build -> toByte
Array())
# pb2 and pb3 are both supported. We recognize the syntax from proto file read
by table source/sink function.
# All protobuf data type is supported including simple type, message type, map
type, array type.
# Implement Dynamic Table Factory and RowData interface
# Support pb.message-class-name, pb.ignore-parse-errors and
pb.ignore-default-values connector params. Check detail in github.
# Support flexible configuration of java_multiple_files and
java_outer_classname in proto file. It could cause code gen error if we do not
care about this.
Here is some of my questions, I'm new to contribute flink code:
# where to place the code in flink project. In flink-formats? Name this module
by "flink-protobuf" ?
# Should I give a more detailed design doc?
# Is the way I use codegen in deserialization part the right way in flink
project?
# Is test case enough and give me some advice.
# Is there any further actions we can move forward.
was (Author: maosuhan):
[~twalthr] [~libenchao]
I have finished the first version of flink-pb. the code is located in
[https://github.com/maosuhan/flink-pb] for temporary use. You can go through
the code with the help of README. You can review the design and the code and I
will be very happy to hear your advices.
Features:
# Codegen in deserialization. The performance is close to java native API.
# Fast serialization. The performance is 3 times faster than proto builder
API.(setter -> build -> toByte
Array())
# pb2 and pb3 are both supported. We recognize the syntax from proto file read
by table source/sink function.
# All protobuf data type is supported including simple type, message type, map
type, array type.
# Implement Dynamic Table Factory and RowData interface
# Support pb.message-class-name, pb.ignore-parse-errors and
pb.ignore-default-values connector params. Check detail in github.
Here is some of my questions, I'm new to contribute flink code:
# where to place the code in flink project. In flink-formats? Name this module
by "flink-protobuf" ?
# Should I give a more detailed design doc?
# Is the way I use codegen in deserialization part the right way in flink
project?
# Is test case enough and give me some advice.
# Is there any further actions we can move forward.
> Introduce Protobuf format
> -------------------------
>
> Key: FLINK-18202
> URL: https://issues.apache.org/jira/browse/FLINK-18202
> Project: Flink
> Issue Type: New Feature
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table
> SQL / API
> Reporter: Benchao Li
> Priority: Major
> Attachments: image-2020-06-15-17-18-03-182.png
>
>
> PB[1] is a very famous and wildly used (de)serialization framework. The ML[2]
> also has some discussions about this. It's a useful feature.
> This issue maybe needs some designs, or a FLIP.
> [1] [https://developers.google.com/protocol-buffers]
> [2] [http://apache-flink.147419.n8.nabble.com/Flink-SQL-UDF-td3725.html]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)