shenh062326 commented on pull request #1868: URL: https://github.com/apache/hudi/pull/1868#issuecomment-663917822
Add a performance test, which insert 100000 records, 1000 fileGroups, each fileGroup's weight is 0.001. ``` public void partitionWeightPerformance() throws Exception { final String testPartitionPath = "2016/09/26"; int totalInsertNum = 100000; HoodieWriteConfig config = makeHoodieClientConfigBuilder() .withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(0) .insertSplitSize(100).autoTuneInsertSplits(false).build()).build(); HoodieClientTestUtils.fakeCommit(basePath, "001"); metaClient = HoodieTableMetaClient.reload(metaClient); HoodieCopyOnWriteTable table = (HoodieCopyOnWriteTable) HoodieTable.create(metaClient, config, hadoopConf); HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator(new String[]{testPartitionPath}); List<HoodieRecord> insertRecords = dataGenerator.generateInserts("001", totalInsertNum); WorkloadProfile profile = new WorkloadProfile(jsc.parallelize(insertRecords)); UpsertPartitioner partitioner = new UpsertPartitioner(profile, jsc, table, config); for (int i = 0; i < 10; i++) { long start = System.currentTimeMillis(); Map<Integer, Integer> partition2numRecords = new HashMap<Integer, Integer>(); for (HoodieRecord hoodieRecord : insertRecords) { int partition = partitioner.getPartition(new Tuple2<>( hoodieRecord.getKey(), Option.ofNullable(hoodieRecord.getCurrentLocation()))); if (!partition2numRecords.containsKey(partition)) { partition2numRecords.put(partition, 0); } partition2numRecords.put(partition, partition2numRecords.get(partition) + 1); } System.out.println("cost: " + (System.currentTimeMillis() - start)); } } ``` Test it ten times, the result before the optimization: ``` cost: 190 cost: 122 cost: 150 cost: 100 cost: 104 cost: 114 cost: 104 cost: 110 cost: 104 cost: 117 ``` The result after the optimization: ``` cost: 154 cost: 83 cost: 77 cost: 84 cost: 85 cost: 84 cost: 87 cost: 99 cost: 102 cost: 85 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org