Hi all, I run PfpGrowth in a big hadoop cluster , but the ParallelFPgrowth MR job got stuck when some ParallelFPGrowthReducer tasks run into fpGrowth.generateTopKFrequentPatterns().
I feed 6,468,284 record for PfpGrowth, and 110825 features greater than minSupport(5), numGroups(20000), treeCacheSize(10) and with other paramters as default. I also set the hadoop mapred.child.java.opts option to 3g. The cluster started 20 ParallelFPGrowthReducer tasks, but about 10 tasks was stuck at the same reason. See the reducer task log(Log 1): 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size 871 for key 8120 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items 871 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48722=>3 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48723=>4 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48720=>5 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48721=>0 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48724=>2 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48725=>1 2011-09-20 21:11:49,860 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned items 27 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP Tree: 93 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 5 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with Least Support 10 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 4 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with Least Support 13 2011-09-20 21:11:49,861 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 3 2011-09-20 21:11:49,862 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with Least Support 13 2011-09-20 21:11:49,862 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 2 2011-09-20 21:11:49,862 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with Least Support 14 2011-09-20 21:11:49,862 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 1 2011-09-20 21:11:49,863 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with Least Support 15 2011-09-20 21:11:49,863 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 0 2011-09-20 21:11:49,863 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with Least Support 17 2011-09-20 21:11:49,863 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level: Cache hits=20 Cache Misses=127 2011-09-20 21:11:49,864 INFO org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size 1410 for key 8140 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items 1410 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48842=>22 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48843=>23 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48840=>21 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48841=>27 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48844=>266 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48845=>28 2011-09-20 21:11:49,865 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned items 353 2011-09-20 21:11:49,866 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP Tree: 2512 2011-09-20 21:11:49,866 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 266 >From last line of Log 1, we can see the reducer task which get stuck is mining FTree Tree for all patterns with the attribute 266. However I think the task is still running because some reducer tasks will keep moving after long time mining(see bellow log, Log 2) 2011-09-20 21:24:08,577 INFO org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size 1183 for key 13596 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items 1183 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81581=>250 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81580=>132 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81577=>129 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81576=>128 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81579=>131 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81578=>130 2011-09-20 21:24:08,578 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned items 251 2011-09-20 21:24:08,579 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP Tree: 1376 2011-09-20 21:24:08,579 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 250 2011-09-20 21:24:08,579 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with Least Support 5 2011-09-20 21:24:08,579 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 132 (take 2.5 hours ) 2011-09-21 00:08:56,616 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 40 Patterns with Least Support 7 2011-09-21 00:08:56,665 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 131 2011-09-21 00:08:56,703 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 3 Patterns with Least Support 7 2011-09-21 00:08:56,703 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 130 2011-09-21 00:08:56,736 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with Least Support 7 2011-09-21 00:08:56,736 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 129 2011-09-21 00:08:56,769 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with Least Support 7 2011-09-21 00:08:56,769 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 128 2011-09-21 00:08:56,809 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with Least Support 7 2011-09-21 00:08:56,809 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level: Cache hits=13 Cache Misses=475 And, the task status showed *"Processing FPTree: Bottom Up FP Growth > reduce"*, which indicates that the job run into FpGrowth.growthBottomUp() and be stuck within FpGrowth.growthBottomUp(). Some reduers will hold at this status about 25 hours till I kill the whole PfpGrowth job manually.(see job history as follows) *JobName: * PFP Parallel FPGrowth running over input (ignore)/sortedoutput *JobConf: * hdfs://hdpnn:9000/home/hadoop/cluster-data/mapred/system/job_201108301259_1089136/job.xml<http://10.249.54.103/jobconf_history.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&jobLogDir=file:/home/hadoop/cluster-data/logs/history&jobUniqueString=hdpjt_1314680340456_job_201108301259_1089136> *Submitted At: * 20-Sep-2011 20:31:57 *Launched At: * 20-Sep-2011 20:32:04 (6sec) *Finished At: * 21-Sep-2011 22:31:51 (25hrs, 59mins, 47sec) *Status: * KILLED *Analyse This Job<http://10.249.54.103/analysejobhistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp> * ------------------------------ Counter Map Reduce Total KindTotal Tasks(successful+failed+killed)Successful tasksFailed tasksKilled tasksStart TimeFinish Time Setup 1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=all> 1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=SUCCESS> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=FAILED> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=KILLED> 20-Sep-2011 20:32:06 20-Sep-2011 20:32:07 (0sec) Map 28<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=all> 22<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=SUCCESS> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=FAILED> 6<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=KILLED> 20-Sep-2011 20:32:13 20-Sep-2011 20:56:01 (23mins, 48sec) Reduce 24<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=all> 18<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=SUCCESS> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=FAILED> 6<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=KILLED> 20-Sep-2011 20:32:33 21-Sep-2011 22:31:51 (25hrs, 59mins, 18sec) Cleanup 1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=all> 1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=SUCCESS> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=FAILED> 0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=KILLED> 21-Sep-2011 22:31:50 21-Sep-2011 22:31:51 (0sec) In Log 2, we see FP Tree for key 13596 only includes 1376 nodes, however it take 2.5 hours to ming patterns for 132. So why it was so slowly when mining frequent patterns for some attributes? What's the reason behinds that or how can I resolve? Thanks in advance! -- Gmail/talk: [email protected]
