JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732506246
##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
<groupId>org.apache.flink</groupId>
<artifactId>flink-formats</artifactId>
<version>1.15-SNAPSHOT</version>
- <relativePath>..</relativePath>
+ <relativePath>../pom.xml</relativePath>
Review comment:
> Thanks for providing this pull request but I have a few preliminary
questions about the design.
>
> Every time I read something about parquet formats I always think the
format should be based on the `BulkFormat` interface. Why did you base your
implementation on the StreamFormat?
>
> As a second point, I'd like to see an IT case using the new format with
the `FileSource`. Did you already test this?
Thanks for asking. Using StreamFormat will enable streaming process for
parquet file source. Further more, the same implementation can be used in batch
processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik,
this is one of the good design coming with the new FileSource.
Logic has been tested in the UT. For the second question about the IT case,
it is a good question to discuss here, I am open for the decision. Question 1:
Format works more like a factory, do we really need IT for a factory? Question
2: since BulkFormat is the only format used to create FileSource internally, we
could consider building the IT for the BulkFormat with the FileSource instead
of the StreamFormat.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]