[
https://issues.apache.org/jira/browse/HIVEMALL-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618917#comment-16618917
]
Makoto Yui commented on HIVEMALL-219:
-------------------------------------
[~jsocsmao] Please use the latest master.
# JDK 7 is required for packagingexport JAVA_HOME=`/usr/libexec/java_home -v
1.7`# Java 8 is required for building Spark 2.2 moduleexport
JAVA8_HOME=`/usr/libexec/java_home -v 1.8`# Try to create artifactsexport
MAVEN_OPTS=-XX:MaxPermSize=256m
# build packages on target/
bin/build.sh
> NullPointerException in pLSA model
> ----------------------------------
>
> Key: HIVEMALL-219
> URL: https://issues.apache.org/jira/browse/HIVEMALL-219
> Project: Hivemall
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: jsocsmao
> Assignee: Makoto Yui
> Priority: Major
> Fix For: 0.5.2
>
>
>
> I was not able to have pLSA model working on relatively small production
> workload (5000 docs) with number of reducers > 1.
> It worked with 1000 documents in a single reducer.
>
> I tried Tez or map-reduce engines:
> hive> create table docres as
> > with word_counts as (
> > select docid, feature(word, count(word)) as f
> > from docs t1 lateral view explode(tokenize(doc, true)) t2 as word
> > where not is_stopword(word)
> > group by docid, word
> > ),
> > input as (
> > select docid, collect_list(f) as features
> > from word_counts
> > group by docid)
> > select train_plsa(features, '-topics 20 -iter 10 -s 500 -delta 1
> -alpha 500 -eps 1') as (label, word, prob)
> > from input;
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Defaulting to jobconf value of: 4
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapreduce.job.reduces=<number>
> Kill Command = /usr/hdp/2.6.1.10-4/hadoop/bin/hadoop job -kill
> job_1536748924580_2024
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers:
> 4
> 2018-09-17 19:05:38,300 Stage-1 map = 0%, reduce = 0%
> 2018-09-17 19:05:52,225 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 10.56
> sec
> 2018-09-17 19:05:55,428 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
> 14.69 sec
> 2018-09-17 19:06:01,903 Stage-1 map = 100%, reduce = 25%, Cumulative CPU
> 14.69 sec
> 2018-09-17 19:06:02,963 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
> 14.69 sec
> 2018-09-17 19:06:09,433 Stage-1 map = 100%, reduce = 17%, Cumulative CPU
> 25.22 sec
> 2018-09-17 19:06:12,803 Stage-1 map = 100%, reduce = 18%, Cumulative CPU
> 28.45 sec
> 2018-09-17 19:06:24,672 Stage-1 map = 100%, reduce = 19%, Cumulative CPU
> 41.58 sec
> 2018-09-17 19:06:36,482 Stage-1 map = 100%, reduce = 20%, Cumulative CPU
> 53.84 sec
> 2018-09-17 19:07:16,082 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
> 14.69 sec
> MapReduce Total cumulative CPU time: 14 seconds 690 msec
> Ended Job = job_1536748924580_2024 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1536748924580_2024_m_000000 (and more) from job
> job_1536748924580_2024
> Examining task ID: task_1536748924580_2024_r_000003 (and more) from job
> job_1536748924580_2024
> Examining task ID: task_1536748924580_2024_r_000001 (and more) from job
> job_1536748924580_2024
> Task with the most failures(10):
> -----
> -----
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Hive Runtime Error while closing
> operators: null
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:286)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> Caused by: java.lang.NullPointerException
> at
> hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF.finalizeTraining(ProbabilisticTopicModelBaseUDTF.java:277)
> at
> hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF.close(ProbabilisticTopicModelBaseUDTF.java:270)
> at
> org.apache.hadoop.hive.ql.exec.UDTFOperator.closeOp(UDTFOperator.java:145)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:634)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
> ... 7 more
>
>
>
> JAR :hivemall-all-0.5.0-incubating.jar
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)