[GitHub] [flink] swuferhong commented on pull request #22805: [FLINK-32365][orc]get orc table statistics in parallel

via GitHub Wed, 21 Jun 2023 02:20:50 -0700


swuferhong commented on PR #22805:
URL: https://github.com/apache/flink/pull/22805#issuecomment-1600493415


   > @luoyuxia I find the code is called in multiple places. We make it 
configurable, we need change more moudles and we get more parameters. if we set 
parameter in hadoop config，both orc and parquet can use this parameter. Could 
you give me some idea？
   
   Hi, did you encounter the problem of slow reporting ORC statistics during 
using hive connector?  If that, I think you can add this parameter into 
`HiveOptions` as a Flink conf, and you need to set this flink conf into job 
conf in method `HiveSourceBuilder.setFlinkConfigurationToJobConf()`  (jobConf 
will be add into hadoopConf in hive source) . By doing this, you can get this 
parameter from `hadoopConf`, if this parameter not in `hadoopConf,` you can set 
it as `Runtime.getRuntime().availableProcessors()` as default. WDYT, @luoyuxia .
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] swuferhong commented on pull request #22805: [FLINK-32365][orc]get orc table statistics in parallel

Reply via email to