[
https://issues.apache.org/jira/browse/NUTCH-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581557#comment-13581557
]
Lewis John McGibbney commented on NUTCH-1534:
---------------------------------------------
It is because of the case where ProtocolStatusCodes.BLOCKED is true in
FetcherReducer.run(), this passes null value for Content.
{code}
556 case ProtocolStatusCodes.BLOCKED:
557 output(fit, null, status, CrawlStatus.STATUS_RETRY);
558 break;
{code}
Are you able to debug any of this Roland?
> cassandra/hector exception: InvalidRequestException(why:column name must not
> be empty)
> --------------------------------------------------------------------------------------
>
> Key: NUTCH-1534
> URL: https://issues.apache.org/jira/browse/NUTCH-1534
> Project: Nutch
> Issue Type: Bug
> Components: fetcher, parser
> Affects Versions: 2.1
> Environment: nutch 2.1 / cassandra 1.2.1
> running fetch with parse=true
> Reporter: Roland
> Fix For: 2.2
>
>
> during bigger fetches (100k+ URLs), sometimes these errors occure:
> {code}
> 2013-02-19 09:32:09,639 WARN fetcher.FetcherJob - Attempting to finish item
> from unknown queue: FetchItem [queueID=http://www.wer-kennt-wen.de, url=http
> ://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09,
> u=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09,
> page=org.apache.nutch.storage.WebPa
> ge@7b1ab444 {
> "baseUrl":"null"
> "status":"34"
> "fetchTime":"1361262537305"
> "prevFetchTime":"1361257503835"
> "fetchInterval":"0"
> "retriesSinceFetch":"0"
> "modifiedTime":"0"
> "protocolStatus":"org.apache.nutch.storage.ProtocolStatus@40b98 {
> "code":"16"
> "args":"[Http code=403,
> url=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09]"
> "lastModified":"0"
> }"
> "content":"null"
> "contentType":"null"
> "prevSignature":"null"
> "signature":"null"
> "title":"null"
> "text":"null"
> "parseStatus":"null"
> "score":"0.0"
> "reprUrl":"null"
> "headers":"{Set-Cookie=WKWSESSID=9d968aeef3a709bc4bba9bb955b93e1e; path=/;
> domain=.wer-kennt-wen.de, Connection=close, Content-Type=text/html, Cache-Co
> ntrol=no-store, no-cache, must-revalidate, post-check=0, pre-check=0,
> Date=Tue, 19 Feb 2013 08:28:57 GMT, P3P=CP="CAO OUR", Expires=Thu, 19 Nov
> 1981 08:5
> 2:00 GMT, Server=Apache, Pragma=no-cache}"
> "outlinks":"{}"
> "inlinks":"{}"
> "markers":"{dist=0, _injmrk_=y, _ftcmrk_=1361257998-2045033576,
> _gnmrk_=1361257998-2045033576}"
> "metadata":"{}"
> }]
> 2013-02-19 09:32:09,640 ERROR fetcher.FetcherJob - Unexpected error for
> http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:column name must not be empty)
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
> at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
> at
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:248)
> at
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:245)
> at
> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
> at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
> at
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:245)
> at
> me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:79)
> at
> org.apache.gora.cassandra.store.CassandraClient.addSubColumn(CassandraClient.java:172)
> at
> org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:360)
> at
> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212)
> at
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
> at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:663)
> at
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:557)
> Caused by: InvalidRequestException(why:column name must not be empty)
> at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19479)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
> at
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
> ... 20 more
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira