[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

GitBox Wed, 20 Oct 2021 01:58:47 -0700


JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732506246




##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-formats</artifactId>
                <version>1.15-SNAPSHOT</version>
-               <relativePath>..</relativePath>
+               <relativePath>../pom.xml</relativePath>

Review comment:
       > Thanks for providing this pull request but I have a few preliminary 
questions about the design.
   > 
   > Every time I read something about parquet formats I always think the 
format should be based on the `BulkFormat` interface. Why did you base your 
implementation on the StreamFormat?
   > 
   > As a second point, I'd like to see an IT case using the new format with 
the `FileSource`. Did you already test this?
   
   
   Thanks for asking. Using StreamFormat will enable streaming process for 
parquet file source. Further more, the same implementation can be used in batch 
processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik, 
this is one of the good design coming with the new FileSource.
   
   Logic has been tested in the UT. For the second question about the IT case, 
it is a good question to discuss here, I am open for the decision. Question 1: 
Format works more like a factory, do we really need IT for a factory? Question 
2: since BulkFormat is the only format used to create FileSource internally, we 
could consider building the IT for the BulkFormat with the FileSource instead 
of the StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Reply via email to