[ 
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445841#comment-13445841
 ] 

Matt MacDonald commented on NUTCH-1445:
---------------------------------------

Hi,

I'm attempting to use the ElasticSearch indexer support and running into an 
issue that I hope you can help with. Given how new this feature is to Nutch, 
there is little writing about how to use it so I'm hoping it's ok to post the 
error I'm bumping into here. If I should open a new JIRA ticket rather than 
commenting on this ticket please let me know. Any ideas about how to call 
and/or configure my Nutch 2.x and ElasticSearch 0.19.4 setup so that I can use 
ElasticSearch as the search index?

I'm running the elasticindex command with the following:

{noformat}bin/nutch elasticindex "Doppleganger" -reindex{noformat}

*and seeing this as the output*
{noformat}
[ matt@Office-iMac ~/Projects/nutch-trunk/runtime/local (git::2.x) ] bin/nutch 
elasticindex "Doppleganger" -reindex
2012-08-31 06:44:09.238 java[53609:1903] Unable to load realm info from 
SCDynamicStore
Exception in thread "main" java.lang.RuntimeException: job failed: 
name=elastic-index [Doppleganger], jobid=job_local_0001
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
        at 
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:52)
        at 
org.apache.nutch.indexer.elastic.ElasticIndexerJob.indexElastic(ElasticIndexerJob.java:60)
        at 
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:73)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
org.apache.nutch.indexer.elastic.ElasticIndexerJob.main(ElasticIndexerJob.java:78)
{noformat}

*Checking logs/hadoop.log shows*
{noformat}
2012-08-31 06:44:41,581 WARN  elasticsearch.discovery - [Mother Night] waited 
for 30s and no initial state was set by the discovery
2012-08-31 06:44:41,581 INFO  elasticsearch.discovery - [Mother Night] 
Doppleganger/2IUXHWKhQfGsBhmPiozyqg
2012-08-31 06:44:41,584 INFO  elasticsearch.http - [Mother Night] bound_address 
{inet[/0.0.0.0:9202]}, publish_address {inet[/192.168.1.133:9202]}
2012-08-31 06:44:41,585 INFO  elasticsearch.node - [Mother Night] 
{0.19.4}[53609]: started
2012-08-31 06:44:41,587 INFO  basic.BasicIndexingFilter - Maximum title length 
for indexing set to: 100
2012-08-31 06:44:41,587 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-08-31 06:44:41,587 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-08-31 06:44:41,587 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-08-31 06:44:42,174 INFO  elastic.ElasticWriter - Processing bulk request 
[docs = 500, length = 732991, total docs = 500, last doc in bulk = 
'us.ma.watertown.ci.www:http/Archive.aspx?ADID=357']
2012-08-31 06:44:42,492 INFO  elastic.ElasticWriter - Processing bulk request 
[docs = 500, length = 943572, total docs = 1000, last doc in bulk = 
'us.ma.watertown.ci.www:http/Directory.aspx?DID=92']
2012-08-31 06:44:42,493 WARN  mapred.FileOutputCommitter - Output path is null 
in cleanup
2012-08-31 06:44:42,494 WARN  mapred.LocalJobRunner - job_local_0001
org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 
1: type is missing;2: type is missing;3: type is missing;4: type is missing;5: 
type is missing;6: type is missing;7: type is missing;8: type is missing;9: 
type is missing;10: type is missing;11: type is missing;12: type is missing;13: 
type is missing;14: type is missing;15: type is missing;16: type is missing;17: 
type is missing;18: type is missing;19: type is missing;20: type is missing;21: 
type is missing;22: type is missing;23: type is missing;24: type is missing;25: 
type is missing;26: type is missing;27: type is missing;28: type is missing;29: 
type is missing;30: type is missing;31: type is missing;32: type is missing;33: 
type is missing;34: type is missing;35: type is missing;36: type is missing;37: 
type is missing;38: type is missing;39: type is missing;40: type is missing;41: 
type is missing;42: type is missing;43: type is missing;44: type is missing;45: 
type is missing;46: type is missing;47: type is missing;48: type is missing;49: 
type is missing;50: type is missing;51: type is missing;52: type is missing;53: 
type is missing;54: type is missing;55: type is missing;56: type is missing;57: 
type is missing;58: type is missing;59: type is missing;60: type is missing;61: 
type is missing;62: type is missing;63: type is missing;64: type is missing;65: 
type is missing;66: type is missing;67: type is missing;68: type is missing;69: 
type is missing;70: type is missing;71: type is missing;72: type is missing;73: 
type is missing;74: type is missing;75: type is missing;76: type is missing;77: 
type is missing;78: type is missing;79: type is missing;80: type is missing;81: 
type is missing;82: type is missing;83: type is missing;84: type is missing;85: 
type is missing;86: type is missing;87: type is missing;88: type is missing;89: 
type is missing;90: type is missing;91: type is missing;92: type is missing;93: 
type is missing;94: type is missing;95: type is missing;96: type is missing;97: 
type is missing;98: type is missing;99: type is missing;100: type is 
missing;101: type is missing;102: type is missing;103: type is missing;104: 
type is missing;105: type is missing;106: type is missing;107: type is 
missing;108: type is missing;109: type is missing;110: type is missing;111: 
type is missing;112: type is missing;113: type is missing;114: type is 
missing;115: type is missing;116: type is missing;117: type is missing;118: 
type is missing;119: type is missing;120: type is missing;121: type is 
missing;122: type is missing;123: type is missing;124: type is missing;125: 
type is missing;126: type is missing;127: type is missing;128: type is 
missing;129: type is missing;130: type is missing;131: type is missing;132: 
type is missing;133: type is missing;134: type is missing;135: type is 
missing;136: type is missing;137: type is missing;138: type is missing;139: 
type is missing;140: type is missing;141: type is missing;142: type is 
missing;143: type is missing;144: type is missing;145: type is missing;146: 
type is missing;147: type is missing;148: type is missing;149: type is 
missing;150: type is missing;151: type is missing;152: type is missing;153: 
type is missing;154: type is missing;155: type is missing;156: type is 
missing;157: type is missing;158: type is missing;159: type is missing;160: 
type is missing;161: type is missing;162: type is missing;163: type is 
missing;164: type is missing;165: type is missing;166: type is missing;167: 
type is missing;168: type is missing;169: type is missing;170: type is 
missing;171: type is missing;172: type is missing;173: type is missing;174: 
type is missing;175: type is missing;176: type is missing;177: type is 
missing;178: type is missing;179: type is missing;180: type is missing;181: 
type is missing;182: type is missing;183: type is missing;184: type is 
missing;185: type is missing;186: type is missing;187: type is missing;188: 
type is missing;189: type is missing;190: type is missing;191: type is 
missing;192: type is missing;193: type is missing;194: type is missing;195: 
type is missing;196: type is missing;197: type is missing;198: type is 
missing;199: type is missing;200: type is missing;201: type is missing;202: 
type is missing;203: type is missing;204: type is missing;205: type is 
missing;206: type is missing;207: type is missing;208: type is missing;209: 
type is missing;210: type is missing;211: type is missing;212: type is 
missing;213: type is missing;214: type is missing;215: type is missing;216: 
type is missing;217: type is missing;218: type is missing;219: type is 
missing;220: type is missing;221: type is missing;222: type is missing;223: 
type is missing;224: type is missing;225: type is missing;226: type is 
missing;227: type is missing;228: type is missing;229: type is missing;230: 
type is missing;231: type is missing;232: type is missing;233: type is 
missing;234: type is missing;235: type is missing;236: type is missing;237: 
type is missing;238: type is missing;239: type is missing;240: type is 
missing;241: type is missing;242: type is missing;243: type is missing;244: 
type is missing;245: type is missing;246: type is missing;247: type is 
missing;248: type is missing;249: type is missing;250: type is missing;251: 
type is missing;252: type is missing;253: type is missing;254: type is 
missing;255: type is missing;256: type is missing;257: type is missing;258: 
type is missing;259: type is missing;260: type is missing;261: type is 
missing;262: type is missing;263: type is missing;264: type is missing;265: 
type is missing;266: type is missing;267: type is missing;268: type is 
missing;269: type is missing;270: type is missing;271: type is missing;272: 
type is missing;273: type is missing;274: type is missing;275: type is 
missing;276: type is missing;277: type is missing;278: type is missing;279: 
type is missing;280: type is missing;281: type is missing;282: type is 
missing;283: type is missing;284: type is missing;285: type is missing;286: 
type is missing;287: type is missing;288: type is missing;289: type is 
missing;290: type is missing;291: type is missing;292: type is missing;293: 
type is missing;294: type is missing;295: type is missing;296: type is 
missing;297: type is missing;298: type is missing;299: type is missing;300: 
type is missing;301: type is missing;302: type is missing;303: type is 
missing;304: type is missing;305: type is missing;306: type is missing;307: 
type is missing;308: type is missing;309: type is missing;310: type is 
missing;311: type is missing;312: type is missing;313: type is missing;314: 
type is missing;315: type is missing;316: type is missing;317: type is 
missing;318: type is missing;319: type is missing;320: type is missing;321: 
type is missing;322: type is missing;323: type is missing;324: type is 
missing;325: type is missing;326: type is missing;327: type is missing;328: 
type is missing;329: type is missing;330: type is missing;331: type is 
missing;332: type is missing;333: type is missing;334: type is missing;335: 
type is missing;336: type is missing;337: type is missing;338: type is 
missing;339: type is missing;340: type is missing;341: type is missing;342: 
type is missing;343: type is missing;344: type is missing;345: type is 
missing;346: type is missing;347: type is missing;348: type is missing;349: 
type is missing;350: type is missing;351: type is missing;352: type is 
missing;353: type is missing;354: type is missing;355: type is missing;356: 
type is missing;357: type is missing;358: type is missing;359: type is 
missing;360: type is missing;361: type is missing;362: type is missing;363: 
type is missing;364: type is missing;365: type is missing;366: type is 
missing;367: type is missing;368: type is missing;369: type is missing;370: 
type is missing;371: type is missing;372: type is missing;373: type is 
missing;374: type is missing;375: type is missing;376: type is missing;377: 
type is missing;378: type is missing;379: type is missing;380: type is 
missing;381: type is missing;382: type is missing;383: type is missing;384: 
type is missing;385: type is missing;386: type is missing;387: type is 
missing;388: type is missing;389: type is missing;390: type is missing;391: 
type is missing;392: type is missing;393: type is missing;394: type is 
missing;395: type is missing;396: type is missing;397: type is missing;398: 
type is missing;399: type is missing;400: type is missing;401: type is 
missing;402: type is missing;403: type is missing;404: type is missing;405: 
type is missing;406: type is missing;407: type is missing;408: type is 
missing;409: type is missing;410: type is missing;411: type is missing;412: 
type is missing;413: type is missing;414: type is missing;415: type is 
missing;416: type is missing;417: type is missing;418: type is missing;419: 
type is missing;420: type is missing;421: type is missing;422: type is 
missing;423: type is missing;424: type is missing;425: type is missing;426: 
type is missing;427: type is missing;428: type is missing;429: type is 
missing;430: type is missing;431: type is missing;432: type is missing;433: 
type is missing;434: type is missing;435: type is missing;436: type is 
missing;437: type is missing;438: type is missing;439: type is missing;440: 
type is missing;441: type is missing;442: type is missing;443: type is 
missing;444: type is missing;445: type is missing;446: type is missing;447: 
type is missing;448: type is missing;449: type is missing;450: type is 
missing;451: type is missing;452: type is missing;453: type is missing;454: 
type is missing;455: type is missing;456: type is missing;457: type is 
missing;458: type is missing;459: type is missing;460: type is missing;461: 
type is missing;462: type is missing;463: type is missing;464: type is 
missing;465: type is missing;466: type is missing;467: type is missing;468: 
type is missing;469: type is missing;470: type is missing;471: type is 
missing;472: type is missing;473: type is missing;474: type is missing;475: 
type is missing;476: type is missing;477: type is missing;478: type is 
missing;479: type is missing;480: type is missing;481: type is missing;482: 
type is missing;483: type is missing;484: type is missing;485: type is 
missing;486: type is missing;487: type is missing;488: type is missing;489: 
type is missing;490: type is missing;491: type is missing;492: type is 
missing;493: type is missing;494: type is missing;495: type is missing;496: 
type is missing;497: type is missing;498: type is missing;499: type is 
missing;500: type is missing;
        at 
org.elasticsearch.action.bulk.BulkRequest.validate(BulkRequest.java:265)
        at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:55)
        at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:83)
        at 
org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:141)
        at 
org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:128)
        at 
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:53)
        at 
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:47)
        at 
org.apache.nutch.indexer.elastic.ElasticWriter.processExecute(ElasticWriter.java:117)
        at 
org.apache.nutch.indexer.elastic.ElasticWriter.write(ElasticWriter.java:91)
        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:45)
        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:40)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:111)
        at 
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
{noformat}

*This is what I see when I start ElasticSearch:*
{noformat}
elasticsearch -f
[2012-08-31 06:31:56,832][INFO ][node                     ] [Doorman] 
{0.19.4}[53351]: initializing ...
[2012-08-31 06:31:56,841][INFO ][plugins                  ] [Doorman] loaded 
[MockSolrPlugin], sites []
[2012-08-31 06:31:57,752][INFO ][node                     ] [Doorman] 
{0.19.4}[53351]: initialized
[2012-08-31 06:31:57,752][INFO ][node                     ] [Doorman] 
{0.19.4}[53351]: starting ...
[2012-08-31 06:31:57,812][INFO ][transport                ] [Doorman] 
bound_address {inet[/0.0.0.0:9301]}, publish_address {inet[/192.168.1.133:9301]}
[2012-08-31 06:32:00,898][INFO ][cluster.service          ] [Doorman] 
detected_master 
[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]], added 
{[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]],}, reason: 
zen-disco-receive(from master 
[[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]]])
[2012-08-31 06:32:00,911][INFO ][discovery                ] [Doorman] 
elasticsearch_matt/YcpHmZWfSdCgvZbg7YfA3g
[2012-08-31 06:32:00,914][INFO ][http                     ] [Doorman] 
bound_address {inet[/0.0.0.0:9201]}, publish_address {inet[/192.168.1.133:9201]}
[2012-08-31 06:32:00,914][INFO ][node                     ] [Doorman] 
{0.19.4}[53351]: started
{noformat}

Thanks,
Matt
                
> Add ElasticIndexerJob that indexes to elasticsearch
> ---------------------------------------------------
>
>                 Key: NUTCH-1445
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1445
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Ferdy Galema
>             Fix For: 2.1
>
>         Attachments: NUTCH-1445-addPropsToConfig.patch, 
> NUTCH-1445-addToNutchScript.patch, NUTCH-1445.patch
>
>
> We have created a new indexer job ElasticIndexerJob that indexes to 
> elasticsearch. It is orginally based upon 
> https://github.com/ctjmorgan/nutch-elasticsearch-indexer (Apache2 license), 
> but we have modified it greatly to make it integrate as good as possible into 
> Nutch. The greatest modification is that documents are asynchronously flushed 
> in bulk to elasticsearch.
> Elasticsearch rocks. Both performance and ease of confiugration is awesome. 
> You simply deploy a server by unpacking the tar, configure the clustername, 
> start the server and fire away indexing requests. Indices are automatically 
> created. Fields are automapped. (Of course it is recommended to create your 
> own optimized mapping, but that is beyond scope of this issue). Multiple 
> servers connect without extra configuration, simply by using the same 
> clustername. (By means of multicast). There a tons of advanced options, such 
> as sharding, replication, disk striping etc.
> To give an example of the performance: With 20+ nodes we are able to index 
> over 1M docs (average sized webdocuments) per minute. The best part is that 
> the added documents are almost instantly searchable, so there no hidden 
> commit costs that Solr has. This is with out-of-the-box configuration.
> (I will attach patch and commit for Nutch2. Feel free to adapt for trunk.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to