This is an automated email from the ASF dual-hosted git repository. arvid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
commit 0decd8be9a5b8dd4788c58d75dbc02d2b7e4aeaa Author: Etienne Chauchot <[email protected]> AuthorDate: Wed Nov 10 11:56:29 2021 +0100 [FLINK-21407][doc][formats] Drop old formats --- .../docs/connectors/datastream/formats/parquet.md | 67 ---------------------- 1 file changed, 67 deletions(-) diff --git a/docs/content/docs/connectors/datastream/formats/parquet.md b/docs/content/docs/connectors/datastream/formats/parquet.md deleted file mode 100644 index fcbe797..0000000 --- a/docs/content/docs/connectors/datastream/formats/parquet.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: "Parquet" -weight: 4 -type: docs -aliases: -- /dev/connectors/formats/parquet.html -- /apis/streaming/connectors/formats/parquet.html ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - - -# Parquet formats - -Flink has built-in support for [Apache Parquet](http://parquet.apache.org/). This allows to read and write Parquet data with Flink. -In order to use the Parquet format the following dependencies are required for projects using a build automation tool (such as Maven or SBT). - -```xml -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-parquet_{{< scala_version >}}</artifactId> - <version>{{< version >}}</version> -</dependency> -``` - -In order to read data from a Parquet file, you have to specify one of the implementation of `ParquetInputFormat`. There are several depending on your needs: -- `ParquetPojoInputFormat<E>` to read POJOs from parquet files -- `ParquetRowInputFormat` to read Flink `Rows` (column oriented records) from parquet files -- `ParquetMapInputFormat` to read Map records (Map of nested Flink type objects) from parquet files -- `ParquetAvroInputFormat` to read Avro Generic Records from parquet files - - -**Example for ParquetRowInputFormat**: - -```java -MessageType parquetSchema = // use parquet libs to provide the parquet schema file and parse it or extract it from the parquet files -ParquetRowInputFormat parquetInputFormat = new ParquetRowInputFormat(new Path(filePath), parquetSchema); -// project only needed fields if suited to reduce the amount of data. Use: parquetSchema#selectFields(projectedFieldNames); -DataStream<Row> input = env.createInput(parquetInputFormat); -``` - -**Example for ParquetAvroInputFormat**: - -```java -MessageType parquetSchema = // use parquet libs to provide the parquet schema file and parse it or extract it from the parquet files -ParquetAvroInputFormat parquetInputFormat = new ParquetAvroInputFormat(new Path(filePath), parquetSchema); -// project only needed fields if suited to reduce the amount of data. Use: parquetSchema#selectFields(projectedFieldNames); -DataStream<GenericRecord> input = env.createInput(parquetInputFormat); -``` - -
