[
https://issues.apache.org/jira/browse/FLINK-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497477#comment-17497477
]
Jing Ge commented on FLINK-26301:
---------------------------------
For use case please refer to FLINK-21406.
This solution leveraged the official Parquet lib. User will provide both the
parquet file and the schema, which means user need to make sure the parquet
file could work with schema and take care of any compatibility issue.
It implements StreamFormat, the @PublicEvolving annotation is missing. Created
FLINK-26357.
> Test AvroParquet format
> -----------------------
>
> Key: FLINK-26301
> URL: https://issues.apache.org/jira/browse/FLINK-26301
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Reporter: Jing Ge
> Assignee: Dawid Wysakowicz
> Priority: Blocker
> Labels: release-testing
> Fix For: 1.15.0
>
>
> The following scenarios are worthwhile to test
> * Start a simple job with None/At-least-once/exactly-once delivery guarantee
> read Avro Generic/sSpecific/Reflect records and write them to an arbitrary
> sink.
> * Start the above job with bounded/unbounded data.
> * Start the above job with streaming/batch execution mode.
>
> This format works with FileSource[2] and can only be used with DataStream.
> Normal parquet files can be used as test files. Schema introduced at [1]
> could be used.
>
> [1]Reference:
> [1][https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/]
> [2]
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)