[
https://issues.apache.org/jira/browse/FLINK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237883#comment-17237883
]
Yun Gao commented on FLINK-20295:
---------------------------------
Hi [~lzljs3620320] Very thanks for reporting this issue! But the issue seems to
be a separate issue with the collect action, as reported in
https://issues.apache.org/jira/browse/FLINK-19204, but it should not affect the
results acquired.
> File Source lost data when reading from directories created by
> FileSystemTableSink with JSON format
> ---------------------------------------------------------------------------------------------------
>
> Key: FLINK-20295
> URL: https://issues.apache.org/jira/browse/FLINK-20295
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem, Table SQL / Ecosystem
> Reporter: Yun Gao
> Priority: Critical
> Fix For: 1.12.0
>
> Attachments: compaction.tgz
>
>
> When testing the compaction functionality of the FileSystemTableSink, I found
> that when using json format, the produced directories could not be read
> correctly by the file source, namely only a part of records are read.
> By checking the produced directories, the number of the records in it is the
> same as expected, thus it seems to be the issue of the source side.
>
> The issue only exists for JSON format.
> The data is produced by
> [FileCompactionTest|https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionTest.java]
> and read by
> [FileCompactionCheckTest|https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionCheckTest.java]
> . An example directories tar file of 8000 records are also attached.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)