[
https://issues.apache.org/jira/browse/FLINK-21195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356291#comment-17356291
]
sujun commented on FLINK-21195:
-------------------------------
hi [~jark] [~lzljs3620320],
I found that the master branch still has this problem. After my test, for some
SQL containing limit expression, ORC format files are 3 times slower than
PARQUET. Does the community have any plan to fix this problem? If so, I can
contribute my code
> LimitableBulkFormat is invalid when format is orc
> -------------------------------------------------
>
> Key: FLINK-21195
> URL: https://issues.apache.org/jira/browse/FLINK-21195
> Project: Flink
> Issue Type: Bug
> Components: Connectors / FileSystem
> Affects Versions: 1.12.1
> Reporter: sujun
> Priority: Minor
> Labels: auto-deprioritized-major
> Attachments: limit_code.jpg, orc_reader_debug.jpg
>
>
> The orc file will read a stripe data in advance in the createReader() method
> (see the construction method of RecordReaderImpl in detail), and the parquet
> file will start to read the block data when the readBatch() method is called,
> so if all orc files have only one stripe, limitableBulkFormat will be invalid
>
> !orc_reader_debug.jpg!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)