[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445841#comment-13445841
]
Matt MacDonald commented on NUTCH-1445:
---------------------------------------
Hi,
I'm attempting to use the ElasticSearch indexer support and running into an
issue that I hope you can help with. Given how new this feature is to Nutch,
there is little writing about how to use it so I'm hoping it's ok to post the
error I'm bumping into here. If I should open a new JIRA ticket rather than
commenting on this ticket please let me know. Any ideas about how to call
and/or configure my Nutch 2.x and ElasticSearch 0.19.4 setup so that I can use
ElasticSearch as the search index?
I'm running the elasticindex command with the following:
{noformat}bin/nutch elasticindex "Doppleganger" -reindex{noformat}
*and seeing this as the output*
{noformat}
[ matt@Office-iMac ~/Projects/nutch-trunk/runtime/local (git::2.x) ] bin/nutch
elasticindex "Doppleganger" -reindex
2012-08-31 06:44:09.238 java[53609:1903] Unable to load realm info from
SCDynamicStore
Exception in thread "main" java.lang.RuntimeException: job failed:
name=elastic-index [Doppleganger], jobid=job_local_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:52)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.indexElastic(ElasticIndexerJob.java:60)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.main(ElasticIndexerJob.java:78)
{noformat}
*Checking logs/hadoop.log shows*
{noformat}
2012-08-31 06:44:41,581 WARN elasticsearch.discovery - [Mother Night] waited
for 30s and no initial state was set by the discovery
2012-08-31 06:44:41,581 INFO elasticsearch.discovery - [Mother Night]
Doppleganger/2IUXHWKhQfGsBhmPiozyqg
2012-08-31 06:44:41,584 INFO elasticsearch.http - [Mother Night] bound_address
{inet[/0.0.0.0:9202]}, publish_address {inet[/192.168.1.133:9202]}
2012-08-31 06:44:41,585 INFO elasticsearch.node - [Mother Night]
{0.19.4}[53609]: started
2012-08-31 06:44:41,587 INFO basic.BasicIndexingFilter - Maximum title length
for indexing set to: 100
2012-08-31 06:44:41,587 INFO indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-08-31 06:44:41,587 INFO anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2012-08-31 06:44:41,587 INFO indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-08-31 06:44:42,174 INFO elastic.ElasticWriter - Processing bulk request
[docs = 500, length = 732991, total docs = 500, last doc in bulk =
'us.ma.watertown.ci.www:http/Archive.aspx?ADID=357']
2012-08-31 06:44:42,492 INFO elastic.ElasticWriter - Processing bulk request
[docs = 500, length = 943572, total docs = 1000, last doc in bulk =
'us.ma.watertown.ci.www:http/Directory.aspx?DID=92']
2012-08-31 06:44:42,493 WARN mapred.FileOutputCommitter - Output path is null
in cleanup
2012-08-31 06:44:42,494 WARN mapred.LocalJobRunner - job_local_0001
org.elasticsearch.action.ActionRequestValidationException: Validation Failed:
1: type is missing;2: type is missing;3: type is missing;4: type is missing;5:
type is missing;6: type is missing;7: type is missing;8: type is missing;9:
type is missing;10: type is missing;11: type is missing;12: type is missing;13:
type is missing;14: type is missing;15: type is missing;16: type is missing;17:
type is missing;18: type is missing;19: type is missing;20: type is missing;21:
type is missing;22: type is missing;23: type is missing;24: type is missing;25:
type is missing;26: type is missing;27: type is missing;28: type is missing;29:
type is missing;30: type is missing;31: type is missing;32: type is missing;33:
type is missing;34: type is missing;35: type is missing;36: type is missing;37:
type is missing;38: type is missing;39: type is missing;40: type is missing;41:
type is missing;42: type is missing;43: type is missing;44: type is missing;45:
type is missing;46: type is missing;47: type is missing;48: type is missing;49:
type is missing;50: type is missing;51: type is missing;52: type is missing;53:
type is missing;54: type is missing;55: type is missing;56: type is missing;57:
type is missing;58: type is missing;59: type is missing;60: type is missing;61:
type is missing;62: type is missing;63: type is missing;64: type is missing;65:
type is missing;66: type is missing;67: type is missing;68: type is missing;69:
type is missing;70: type is missing;71: type is missing;72: type is missing;73:
type is missing;74: type is missing;75: type is missing;76: type is missing;77:
type is missing;78: type is missing;79: type is missing;80: type is missing;81:
type is missing;82: type is missing;83: type is missing;84: type is missing;85:
type is missing;86: type is missing;87: type is missing;88: type is missing;89:
type is missing;90: type is missing;91: type is missing;92: type is missing;93:
type is missing;94: type is missing;95: type is missing;96: type is missing;97:
type is missing;98: type is missing;99: type is missing;100: type is
missing;101: type is missing;102: type is missing;103: type is missing;104:
type is missing;105: type is missing;106: type is missing;107: type is
missing;108: type is missing;109: type is missing;110: type is missing;111:
type is missing;112: type is missing;113: type is missing;114: type is
missing;115: type is missing;116: type is missing;117: type is missing;118:
type is missing;119: type is missing;120: type is missing;121: type is
missing;122: type is missing;123: type is missing;124: type is missing;125:
type is missing;126: type is missing;127: type is missing;128: type is
missing;129: type is missing;130: type is missing;131: type is missing;132:
type is missing;133: type is missing;134: type is missing;135: type is
missing;136: type is missing;137: type is missing;138: type is missing;139:
type is missing;140: type is missing;141: type is missing;142: type is
missing;143: type is missing;144: type is missing;145: type is missing;146:
type is missing;147: type is missing;148: type is missing;149: type is
missing;150: type is missing;151: type is missing;152: type is missing;153:
type is missing;154: type is missing;155: type is missing;156: type is
missing;157: type is missing;158: type is missing;159: type is missing;160:
type is missing;161: type is missing;162: type is missing;163: type is
missing;164: type is missing;165: type is missing;166: type is missing;167:
type is missing;168: type is missing;169: type is missing;170: type is
missing;171: type is missing;172: type is missing;173: type is missing;174:
type is missing;175: type is missing;176: type is missing;177: type is
missing;178: type is missing;179: type is missing;180: type is missing;181:
type is missing;182: type is missing;183: type is missing;184: type is
missing;185: type is missing;186: type is missing;187: type is missing;188:
type is missing;189: type is missing;190: type is missing;191: type is
missing;192: type is missing;193: type is missing;194: type is missing;195:
type is missing;196: type is missing;197: type is missing;198: type is
missing;199: type is missing;200: type is missing;201: type is missing;202:
type is missing;203: type is missing;204: type is missing;205: type is
missing;206: type is missing;207: type is missing;208: type is missing;209:
type is missing;210: type is missing;211: type is missing;212: type is
missing;213: type is missing;214: type is missing;215: type is missing;216:
type is missing;217: type is missing;218: type is missing;219: type is
missing;220: type is missing;221: type is missing;222: type is missing;223:
type is missing;224: type is missing;225: type is missing;226: type is
missing;227: type is missing;228: type is missing;229: type is missing;230:
type is missing;231: type is missing;232: type is missing;233: type is
missing;234: type is missing;235: type is missing;236: type is missing;237:
type is missing;238: type is missing;239: type is missing;240: type is
missing;241: type is missing;242: type is missing;243: type is missing;244:
type is missing;245: type is missing;246: type is missing;247: type is
missing;248: type is missing;249: type is missing;250: type is missing;251:
type is missing;252: type is missing;253: type is missing;254: type is
missing;255: type is missing;256: type is missing;257: type is missing;258:
type is missing;259: type is missing;260: type is missing;261: type is
missing;262: type is missing;263: type is missing;264: type is missing;265:
type is missing;266: type is missing;267: type is missing;268: type is
missing;269: type is missing;270: type is missing;271: type is missing;272:
type is missing;273: type is missing;274: type is missing;275: type is
missing;276: type is missing;277: type is missing;278: type is missing;279:
type is missing;280: type is missing;281: type is missing;282: type is
missing;283: type is missing;284: type is missing;285: type is missing;286:
type is missing;287: type is missing;288: type is missing;289: type is
missing;290: type is missing;291: type is missing;292: type is missing;293:
type is missing;294: type is missing;295: type is missing;296: type is
missing;297: type is missing;298: type is missing;299: type is missing;300:
type is missing;301: type is missing;302: type is missing;303: type is
missing;304: type is missing;305: type is missing;306: type is missing;307:
type is missing;308: type is missing;309: type is missing;310: type is
missing;311: type is missing;312: type is missing;313: type is missing;314:
type is missing;315: type is missing;316: type is missing;317: type is
missing;318: type is missing;319: type is missing;320: type is missing;321:
type is missing;322: type is missing;323: type is missing;324: type is
missing;325: type is missing;326: type is missing;327: type is missing;328:
type is missing;329: type is missing;330: type is missing;331: type is
missing;332: type is missing;333: type is missing;334: type is missing;335:
type is missing;336: type is missing;337: type is missing;338: type is
missing;339: type is missing;340: type is missing;341: type is missing;342:
type is missing;343: type is missing;344: type is missing;345: type is
missing;346: type is missing;347: type is missing;348: type is missing;349:
type is missing;350: type is missing;351: type is missing;352: type is
missing;353: type is missing;354: type is missing;355: type is missing;356:
type is missing;357: type is missing;358: type is missing;359: type is
missing;360: type is missing;361: type is missing;362: type is missing;363:
type is missing;364: type is missing;365: type is missing;366: type is
missing;367: type is missing;368: type is missing;369: type is missing;370:
type is missing;371: type is missing;372: type is missing;373: type is
missing;374: type is missing;375: type is missing;376: type is missing;377:
type is missing;378: type is missing;379: type is missing;380: type is
missing;381: type is missing;382: type is missing;383: type is missing;384:
type is missing;385: type is missing;386: type is missing;387: type is
missing;388: type is missing;389: type is missing;390: type is missing;391:
type is missing;392: type is missing;393: type is missing;394: type is
missing;395: type is missing;396: type is missing;397: type is missing;398:
type is missing;399: type is missing;400: type is missing;401: type is
missing;402: type is missing;403: type is missing;404: type is missing;405:
type is missing;406: type is missing;407: type is missing;408: type is
missing;409: type is missing;410: type is missing;411: type is missing;412:
type is missing;413: type is missing;414: type is missing;415: type is
missing;416: type is missing;417: type is missing;418: type is missing;419:
type is missing;420: type is missing;421: type is missing;422: type is
missing;423: type is missing;424: type is missing;425: type is missing;426:
type is missing;427: type is missing;428: type is missing;429: type is
missing;430: type is missing;431: type is missing;432: type is missing;433:
type is missing;434: type is missing;435: type is missing;436: type is
missing;437: type is missing;438: type is missing;439: type is missing;440:
type is missing;441: type is missing;442: type is missing;443: type is
missing;444: type is missing;445: type is missing;446: type is missing;447:
type is missing;448: type is missing;449: type is missing;450: type is
missing;451: type is missing;452: type is missing;453: type is missing;454:
type is missing;455: type is missing;456: type is missing;457: type is
missing;458: type is missing;459: type is missing;460: type is missing;461:
type is missing;462: type is missing;463: type is missing;464: type is
missing;465: type is missing;466: type is missing;467: type is missing;468:
type is missing;469: type is missing;470: type is missing;471: type is
missing;472: type is missing;473: type is missing;474: type is missing;475:
type is missing;476: type is missing;477: type is missing;478: type is
missing;479: type is missing;480: type is missing;481: type is missing;482:
type is missing;483: type is missing;484: type is missing;485: type is
missing;486: type is missing;487: type is missing;488: type is missing;489:
type is missing;490: type is missing;491: type is missing;492: type is
missing;493: type is missing;494: type is missing;495: type is missing;496:
type is missing;497: type is missing;498: type is missing;499: type is
missing;500: type is missing;
at
org.elasticsearch.action.bulk.BulkRequest.validate(BulkRequest.java:265)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:55)
at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:83)
at
org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:141)
at
org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:128)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:47)
at
org.apache.nutch.indexer.elastic.ElasticWriter.processExecute(ElasticWriter.java:117)
at
org.apache.nutch.indexer.elastic.ElasticWriter.write(ElasticWriter.java:91)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:45)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:40)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:111)
at
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:61)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
{noformat}
*This is what I see when I start ElasticSearch:*
{noformat}
elasticsearch -f
[2012-08-31 06:31:56,832][INFO ][node ] [Doorman]
{0.19.4}[53351]: initializing ...
[2012-08-31 06:31:56,841][INFO ][plugins ] [Doorman] loaded
[MockSolrPlugin], sites []
[2012-08-31 06:31:57,752][INFO ][node ] [Doorman]
{0.19.4}[53351]: initialized
[2012-08-31 06:31:57,752][INFO ][node ] [Doorman]
{0.19.4}[53351]: starting ...
[2012-08-31 06:31:57,812][INFO ][transport ] [Doorman]
bound_address {inet[/0.0.0.0:9301]}, publish_address {inet[/192.168.1.133:9301]}
[2012-08-31 06:32:00,898][INFO ][cluster.service ] [Doorman]
detected_master
[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]], added
{[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]],}, reason:
zen-disco-receive(from master
[[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]]])
[2012-08-31 06:32:00,911][INFO ][discovery ] [Doorman]
elasticsearch_matt/YcpHmZWfSdCgvZbg7YfA3g
[2012-08-31 06:32:00,914][INFO ][http ] [Doorman]
bound_address {inet[/0.0.0.0:9201]}, publish_address {inet[/192.168.1.133:9201]}
[2012-08-31 06:32:00,914][INFO ][node ] [Doorman]
{0.19.4}[53351]: started
{noformat}
Thanks,
Matt
> Add ElasticIndexerJob that indexes to elasticsearch
> ---------------------------------------------------
>
> Key: NUTCH-1445
> URL: https://issues.apache.org/jira/browse/NUTCH-1445
> Project: Nutch
> Issue Type: New Feature
> Reporter: Ferdy Galema
> Fix For: 2.1
>
> Attachments: NUTCH-1445-addPropsToConfig.patch,
> NUTCH-1445-addToNutchScript.patch, NUTCH-1445.patch
>
>
> We have created a new indexer job ElasticIndexerJob that indexes to
> elasticsearch. It is orginally based upon
> https://github.com/ctjmorgan/nutch-elasticsearch-indexer (Apache2 license),
> but we have modified it greatly to make it integrate as good as possible into
> Nutch. The greatest modification is that documents are asynchronously flushed
> in bulk to elasticsearch.
> Elasticsearch rocks. Both performance and ease of confiugration is awesome.
> You simply deploy a server by unpacking the tar, configure the clustername,
> start the server and fire away indexing requests. Indices are automatically
> created. Fields are automapped. (Of course it is recommended to create your
> own optimized mapping, but that is beyond scope of this issue). Multiple
> servers connect without extra configuration, simply by using the same
> clustername. (By means of multicast). There a tons of advanced options, such
> as sharding, replication, disk striping etc.
> To give an example of the performance: With 20+ nodes we are able to index
> over 1M docs (average sized webdocuments) per minute. The best part is that
> the added documents are almost instantly searchable, so there no hidden
> commit costs that Solr has. This is with out-of-the-box configuration.
> (I will attach patch and commit for Nutch2. Feel free to adapt for trunk.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira