[ 
https://issues.apache.org/jira/browse/FLINK-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014009#comment-17014009
 ] 

Jingsong Lee commented on FLINK-15527:
--------------------------------------

Hi [~xintongsong]

You are right about using yarn queue with limited resource quota. This is an 
inconvenient solution.

At present, our default behavior is batch rather than pipeline, which has many 
advantages and does not affect performance. Generating more parallelisms means 
that the granularity of fault tolerance can be smaller. The typical mode of 
batch job is limited resources and large parallelisms. Batch by batch 
parallelisms runs slowly. So generally speaking, the amount of parallelisms has 
little to do with the current resources.

> can not control the number of container on yarn single job module
> -----------------------------------------------------------------
>
>                 Key: FLINK-15527
>                 URL: https://issues.apache.org/jira/browse/FLINK-15527
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.10.0
>            Reporter: chenchencc
>            Priority: Major
>             Fix For: 1.10.0
>
>         Attachments: flink-conf.yaml, image-2020-01-09-14-30-46-973.png, 
> yarn_application.png
>
>
> when run yarn single job run many container but paralism set 4
> *scripts:*
> ./bin/flink run -m yarn-cluster -ys 3 -p 4 -yjm 1024m -ytm 4096m -yqu bi -c 
> com.cc.test.HiveTest2 ./cc_jars/hive-1.0-SNAPSHOT.jar 11.txt test61 6
> _notes_: in  1.9.1 has cli paramter -yn to control the number of containers 
> and in 1.10 remove it
> *result:*
> the number of containers is 500+
>  
> *code use:*
> query the table and save it to the hdfs text
>  
> the storge of table is 200g+
>  
>  
>  
>  
> *code:*
> com.cc.test.HiveTest2
> public static void main(String[] args) throws Exception
> { EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
>  StreamExecutionEnvironment settings2 = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> settings2.setParallelism(Integer.valueOf(args[2]));
> StreamTableEnvironment tableEnv = StreamTableEnvironment.create(settings2, 
> settings); String name = "myhive"; String defaultDatabase = "test"; String 
> hiveConfDir = "/etc/hive/conf"; String version = "1.2.1"; // or 1.2.1 2.3.4 
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, 
> version); tableEnv.registerCatalog("myhive", hive); 
> tableEnv.useCatalog("myhive"); tableEnv.listTables(); Table table = 
> tableEnv.sqlQuery("select id from orderparent_test2 where id = 
> 'A000021204170176'"); tableEnv.toAppendStream(table, Row.class).print(); 
> tableEnv.toAppendStream(table, Row.class) 
> .writeAsText("hdfs:///user/chenchao1/"+ args[0], 
> FileSystem.WriteMode.OVERWRITE); tableEnv.execute(args[1]); }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to