[ 
https://issues.apache.org/jira/browse/FLINK-21195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356291#comment-17356291
 ] 

sujun commented on FLINK-21195:
-------------------------------

hi [~jark] [~lzljs3620320],

I found that the master branch still has this problem. After my test, for some 
SQL containing limit expression, ORC format files are 3 times slower than 
PARQUET. Does the community have any plan to fix this problem? If so, I can 
contribute my code

> LimitableBulkFormat is invalid when format is orc
> -------------------------------------------------
>
>                 Key: FLINK-21195
>                 URL: https://issues.apache.org/jira/browse/FLINK-21195
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.12.1
>            Reporter: sujun
>            Priority: Minor
>              Labels: auto-deprioritized-major
>         Attachments: limit_code.jpg, orc_reader_debug.jpg
>
>
> The orc file will read a stripe data in advance in the createReader() method 
> (see the construction method of RecordReaderImpl in detail), and the parquet 
> file will start to read the block data when the readBatch() method is called, 
> so if all orc files have only one stripe, limitableBulkFormat will be invalid
>  
> !orc_reader_debug.jpg!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to