[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

Shubham Gupta (JIRA) Sun, 25 Sep 2016 21:16:33 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15521993#comment-15521993
 ]


Shubham Gupta commented on NUTCH-2315:
--------------------------------------

Now, The Malformed Url Exception is not coming.
But the following exception leads to the failure of update job :

Error: java.lang.RuntimeException: com.mongodb.WriteConcernException: { 
"serverUsed" : "host:37006" , "ok" : 1 , "n" : 0 , "updatedExisting" : false , 
"err" : "insertDocument :: caused by :: 17280 Btree::insert: key too large to 
index, failing .$_id_ 2772 { : \" “but when someone is on the s register, not 
the 15,000 [suspicious] people, the first few hundred on list – should we wait 
for them to act or sho...\" }" , "code" : 17280} at 
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:76) at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
 at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
 at org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:236) at 
org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:42) at 
org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
com.mongodb.WriteConcernException: { "serverUsed" : "host:37006" , "ok" : 1 , 
"n" : 0 , "updatedExisting" : false , "err" : "insertDocument :: caused by :: 
17280 Btree::insert: key too large to index, _id_ 2772 { : \" “but when someone 
is on the s register, not the 15,000 [suspicious] people, the first few hundred 
on list – should we wait for them to act or sho...\" }" , "code" : 17280} at 
com.mongodb.CommandResult.getWriteException(CommandResult.java:90) at 
com.mongodb.CommandResult.getException(CommandResult.java:79) at 
com.mongodb.DBCollectionImpl.translateBulkWriteException(DBCollectionImpl.java:316)
 at com.mongodb.DBCollectionImpl.update(DBCollectionImpl.java:274) at 
com.mongodb.DBCollection.update(DBCollection.java:214) at 
com.mongodb.DBCollection.update(DBCollection.java:247) at 
org.apache.gora.mongodb.store.MongoStore.performPut(MongoStore.java:361) at 
org.apache.gora.mongodb.store.MongoStore.put(MongoStore.java:326) at 
org.apache.gora.mongodb.store.MongoStore.put(MongoStore.java:70) at 
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:67)

> UpdateDb jobs fails everytime (Nutch 2.3.1)
> -------------------------------------------
>
>                 Key: NUTCH-2315
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2315
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.3.1
>         Environment: I am using it with Hadoop 2.7.1 + Mongo DB + Yarn + Gora 
> 0.61
>            Reporter: Shubham Gupta
>              Labels: newbie
>             Fix For: 2.4
>
>         Attachments: NUTCH-2315-2.3.1-1.patch
>
>
> Hey,
> Whenever I run the update job, the following error occurs:
> INFO mapreduce.Job: Task Id : attempt_1473832356852_0107_m_000000_2, Status : 
> FAILED
> Error: java.net.MalformedURLException: no protocol: 
> http%3A%2F%2Fwww.smh.com.au%2Fact-news%2Fcanberra-weather-warm-april-expected-after-record-breaking-march-temperatures-20160401-gnw2pg.html&title=Canberra+weather%3A+warm+April+expected+after+record+breaking+March+temperatures&source=The+Sydney+Morning+Herald&summary=Canberra+can+expect+warmer+than+average+temperatures+to+continue+for+April+after+enjoying+its+equal+second+warmest+March+on+record
>       at java.net.URL.<init>(URL.java:586)
>       at java.net.URL.<init>(URL.java:483)
>       at java.net.URL.<init>(URL.java:432)
>       at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
>       at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
>       at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 16/09/15 12:44:35 INFO mapreduce.Job:  map 100% reduce 100%
> 16/09/15 12:44:36 INFO mapreduce.Job: Job job_1473832356852_0107 failed with 
> state FAILED due to: Task failed task_1473832356852_0107_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 16/09/15 12:44:36 INFO mapreduce.Job: Counters: 8
>       Job Counters 
>               Failed map tasks=4
>               Launched map tasks=4
>               Other local map tasks=4
>               Total time spent by all maps in occupied slots (ms)=388304
>               Total time spent by all reduces in occupied slots (ms)=0
>               Total time spent by all map tasks (ms)=55472
>               Total vcore-seconds taken by all map tasks=55472
>               Total megabyte-seconds taken by all map tasks=198145984
> Exception in thread "main" java.lang.RuntimeException: job failed: 
> name=[rss]update-table, jobid=job_1473832356852_0107
>       at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
>       at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111)
>       at 
> org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140)
>       at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

Reply via email to