[ https://issues.apache.org/jira/browse/HIVEMALL-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Makoto Yui reassigned HIVEMALL-219: ----------------------------------- Assignee: Makoto Yui > NullPointerException in pLSA model > ---------------------------------- > > Key: HIVEMALL-219 > URL: https://issues.apache.org/jira/browse/HIVEMALL-219 > Project: Hivemall > Issue Type: Bug > Affects Versions: 0.5.0 > Reporter: jsocsmao > Assignee: Makoto Yui > Priority: Major > > > I was not able to have pLSA model working on relatively small production > workload (5000 docs) with number of reducers > 1. > It worked with 1000 documents in a single reducer. > > I tried Tez or map-reduce engines: > hive> create table docres as > > with word_counts as ( > > select docid, feature(word, count(word)) as f > > from docs t1 lateral view explode(tokenize(doc, true)) t2 as word > > where not is_stopword(word) > > group by docid, word > > ), > > input as ( > > select docid, collect_list(f) as features > > from word_counts > > group by docid) > > select train_plsa(features, '-topics 20 -iter 10 -s 500 -delta 1 > -alpha 500 -eps 1') as (label, word, prob) > > from input; > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks not specified. Defaulting to jobconf value of: 4 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Kill Command = /usr/hdp/2.6.1.10-4/hadoop/bin/hadoop job -kill > job_1536748924580_2024 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 4 > 2018-09-17 19:05:38,300 Stage-1 map = 0%, reduce = 0% > 2018-09-17 19:05:52,225 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 10.56 > sec > 2018-09-17 19:05:55,428 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 14.69 sec > 2018-09-17 19:06:01,903 Stage-1 map = 100%, reduce = 25%, Cumulative CPU > 14.69 sec > 2018-09-17 19:06:02,963 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 14.69 sec > 2018-09-17 19:06:09,433 Stage-1 map = 100%, reduce = 17%, Cumulative CPU > 25.22 sec > 2018-09-17 19:06:12,803 Stage-1 map = 100%, reduce = 18%, Cumulative CPU > 28.45 sec > 2018-09-17 19:06:24,672 Stage-1 map = 100%, reduce = 19%, Cumulative CPU > 41.58 sec > 2018-09-17 19:06:36,482 Stage-1 map = 100%, reduce = 20%, Cumulative CPU > 53.84 sec > 2018-09-17 19:07:16,082 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 14.69 sec > MapReduce Total cumulative CPU time: 14 seconds 690 msec > Ended Job = job_1536748924580_2024 with errors > Error during job, obtaining debugging information... > Examining task ID: task_1536748924580_2024_m_000000 (and more) from job > job_1536748924580_2024 > Examining task ID: task_1536748924580_2024_r_000003 (and more) from job > job_1536748924580_2024 > Examining task ID: task_1536748924580_2024_r_000001 (and more) from job > job_1536748924580_2024 > Task with the most failures(10): > ----- > ----- > Diagnostic Messages for this Task: > Error: java.lang.RuntimeException: Hive Runtime Error while closing > operators: null > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:286) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > Caused by: java.lang.NullPointerException > at > hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF.finalizeTraining(ProbabilisticTopicModelBaseUDTF.java:277) > at > hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF.close(ProbabilisticTopicModelBaseUDTF.java:270) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.closeOp(UDTFOperator.java:145) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278) > ... 7 more > > > > JAR :hivemall-all-0.5.0-incubating.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005)