[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890539#comment-13890539
]
Anton commented on NUTCH-1478:
-------------------------------
Steps to reproduce:
1) Add fields for metatags
<field name="metatag.description" type="string" stored="true"
indexed="true"/>
in schema.xml both in solr and nutch
2) restart solr
3) configure nutch-default.xml as in my comment above
4) setup urls/seed.txt in nutch
5) ant clean && ant runtime
6) run crawl command
I use solr-4.6.0 apache-nutch-2.2.1
When I run full crawl with such command
/home/hadoop/webcrawer/apache-nutch-2.2.1/runtime/deploy/bin/crawl
urls/seed.txt az http://localhost:8088/solr/ 1
metadata is successfully parsed and stored in database, problem occurs in
SolrIndexerJob
{code:java}
14/02/04 13:00:46 INFO solr.SolrIndexerJob: SolrIndexerJob: starting
14/02/04 13:00:46 INFO plugin.PluginRepository: Plugins: looking in:
/home/hadoop/data/hadoop-unjar8289682370547831088/classes/plugins
14/02/04 13:00:46 INFO plugin.PluginRepository: Plugin Auto-activation mode:
[true]
14/02/04 13:00:46 INFO plugin.PluginRepository: Registered Plugins:
14/02/04 13:00:46 INFO plugin.PluginRepository: the nutch core
extension points (nutch-extensionpoints)
14/02/04 13:00:46 INFO plugin.PluginRepository: Basic URL Normalizer
(urlnormalizer-basic)
14/02/04 13:00:46 INFO plugin.PluginRepository: Html Parse Plug-in
(parse-html)
14/02/04 13:00:46 INFO plugin.PluginRepository: Basic Indexing Filter
(index-basic)
14/02/04 13:00:46 INFO plugin.PluginRepository: HTTP Framework
(lib-http)
14/02/04 13:00:46 INFO plugin.PluginRepository: Pass-through URL
Normalizer (urlnormalizer-pass)
14/02/04 13:00:46 INFO plugin.PluginRepository: Regex URL Filter
(urlfilter-regex)
14/02/04 13:00:46 INFO plugin.PluginRepository: Http Protocol Plug-in
(protocol-http)
14/02/04 13:00:46 INFO plugin.PluginRepository: Regex URL Normalizer
(urlnormalizer-regex)
14/02/04 13:00:46 INFO plugin.PluginRepository: Tika Parser Plug-in
(parse-tika)
14/02/04 13:00:46 INFO plugin.PluginRepository: OPIC Scoring Plug-in
(scoring-opic)
14/02/04 13:00:46 INFO plugin.PluginRepository: CyberNeko HTML Parser
(lib-nekohtml)
14/02/04 13:00:46 INFO plugin.PluginRepository: Anchor Indexing Filter
(index-anchor)
14/02/04 13:00:46 INFO plugin.PluginRepository: Regex URL Filter
Framework (lib-regex-filter)
14/02/04 13:00:46 INFO plugin.PluginRepository: MetaTags
(parse-metatags)
14/02/04 13:00:46 INFO plugin.PluginRepository: Index Metadata
(index-metadata)
14/02/04 13:00:46 INFO plugin.PluginRepository: Registered Extension-Points:
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch Protocol
(org.apache.nutch.protocol.Protocol)
14/02/04 13:00:46 INFO plugin.PluginRepository: Parse Filter
(org.apache.nutch.parse.ParseFilter)
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch URL Filter
(org.apache.nutch.net.URLFilter)
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch Content Parser
(org.apache.nutch.parse.Parser)
14/02/04 13:00:46 INFO plugin.PluginRepository: Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
14/02/04 13:00:46 INFO basic.BasicIndexingFilter: Maximum title length for
indexing set to: 100
14/02/04 13:00:46 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
14/02/04 13:00:46 INFO anchor.AnchorIndexingFilter: Anchor deduplication is: off
14/02/04 13:00:46 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
14/02/04 13:00:46 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.metadata.MetadataIndexer
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:host.name=ascompany.info
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.7.0_45
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Oracle Corporation
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/home/hadoop/hadoop-1.2.1/libexec/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/hadoop/hadoop-1.2.1/libexec/..:/home/hadoop/hadoop-1.2.1/libexec/../hadoop-core-1.2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/asm-3.2.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/aspectjrt-1.6.11.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/aspectjtools-1.6.11.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-codec-1.4.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-daemon-1.0.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-digester-1.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-io-2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-lang-2.4.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-math-2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/commons-net-3.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/core-3.1.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/hadoop-capacity-scheduler-1.2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/hadoop-fairscheduler-1.2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/hadoop-thriftfs-1.2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jdeb-0.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jersey-core-1.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jersey-json-1.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jersey-server-1.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jetty-6.1.26.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jsch-0.1.42.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/junit-4.5.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/servlet-api-2.5-20081211.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-1.2.1/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/home/hadoop/hadoop-1.2.1/libexec/../lib/native/Linux-amd64-64
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:os.version=3.2.0-4-amd64
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/hadoop
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d1, negotiated
timeout = 180000
14/02/04 13:00:46 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d2, negotiated
timeout = 180000
14/02/04 13:00:46 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:46 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:46 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d3, negotiated
timeout = 180000
14/02/04 13:00:47 INFO mapred.JobClient: Running job: job_local1932930342_0001
14/02/04 13:00:47 INFO mapred.LocalJobRunner: Waiting for map tasks
14/02/04 13:00:47 INFO mapred.LocalJobRunner: Starting task:
attempt_local1932930342_0001_m_000000_0
14/02/04 13:00:47 INFO util.ProcessTree: setsid exited with exit code 0
14/02/04 13:00:47 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2ba04d20
14/02/04 13:00:47 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d4, negotiated
timeout = 180000
14/02/04 13:00:47 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:47 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d5, negotiated
timeout = 180000
14/02/04 13:00:47 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:47 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d6, negotiated
timeout = 180000
14/02/04 13:00:47 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:47 INFO mapred.MapTask: Processing split:
org.apache.gora.mapreduce.GoraInputSplit@3a37f44d
14/02/04 13:00:47 INFO mapreduce.GoraRecordReader: gora.buffer.read.limit =
10000
14/02/04 13:00:47 INFO solr.SolrIndexerJob: Authenticating as: solr-user
14/02/04 13:00:47 INFO conf.Configuration: found resource solrindex-mapping.xml
at file:/home/hadoop/data/hadoop-unjar8289682370547831088/solrindex-mapping.xml
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: content dest: content
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: title dest: title
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: host dest: host
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: batchId dest: batchId
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: boost dest: boost
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: digest dest: digest
14/02/04 13:00:47 INFO solr.SolrMappingReader: source: tstamp dest: tstamp
14/02/04 13:00:47 INFO basic.BasicIndexingFilter: Maximum title length for
indexing set to: 100
14/02/04 13:00:47 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
14/02/04 13:00:47 INFO anchor.AnchorIndexingFilter: Anchor deduplication is: off
14/02/04 13:00:47 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
14/02/04 13:00:47 INFO indexer.IndexingFilters: Adding
org.apache.nutch.indexer.metadata.MetadataIndexer
14/02/04 13:00:47 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d7, negotiated
timeout = 180000
14/02/04 13:00:47 INFO store.HBaseStore: Keyclass and nameclass match but
mismatching table names mappingfile schema is 'webpage' vs actual schema
'az_webpage' , assuming they are the same.
14/02/04 13:00:47 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
14/02/04 13:00:47 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x142ea7be01213d8, negotiated
timeout = 180000
14/02/04 13:00:47 INFO mapred.LocalJobRunner: Map task executor complete.
14/02/04 13:00:47 WARN mapred.FileOutputCommitter: Output path is null in
cleanup
14/02/04 13:00:47 WARN mapred.LocalJobRunner: job_local1932930342_0001
java.lang.Exception: java.lang.NullPointerException
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException
at
org.apache.nutch.indexer.metadata.MetadataIndexer.filter(MetadataIndexer.java:95)
at
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:107)
at org.apache.nutch.indexer.IndexUtil.index(IndexUtil.java:77)
at
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:103)
at
org.apache.nutch.indexer.IndexerJob$IndexerMapper.map(IndexerJob.java:61)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/02/04 13:00:48 INFO mapred.JobClient: map 0% reduce 0%
14/02/04 13:00:48 INFO mapred.JobClient: Job complete: job_local1932930342_0001
14/02/04 13:00:48 INFO mapred.JobClient: Counters: 0
14/02/04 13:00:48 ERROR solr.SolrIndexerJob: SolrIndexerJob:
java.lang.RuntimeException: job failed: name=[az]solr-index,
jobid=job_local1932930342_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:46)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:54)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
{code}
> Parse-metatags and index-metadata plugin for Nutch 2.x series
> --------------------------------------------------------------
>
> Key: NUTCH-1478
> URL: https://issues.apache.org/jira/browse/NUTCH-1478
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.1
> Reporter: kiran
> Fix For: 2.3
>
> Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch,
> NUTCH-1478v4.patch, Nutch1478.patch, Nutch1478.zip,
> metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.
> This will take multiple values of same tag and index in Solr as i patched
> before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here
> (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is
> no need to give 'metatag' keyword before metatag names. For example my
> configuration looks like this
> (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)
>
> This is only the first version and does not include the junit test. I will
> update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the
> fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)