[ 
https://issues.apache.org/jira/browse/NUTCH-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920849#comment-16920849
 ] 

ASF GitHub Bot commented on NUTCH-2654:
---------------------------------------

jorgelbg commented on issue #468: NUTCH-2654: The obsolete configuration of 
index writers has been removed.
URL: https://github.com/apache/nutch/pull/468#issuecomment-527144600
 
 
   +1 for removing the outdated config from the `nutch-default.xml`.
   
   I'm not so sure about the removal of the `schema.xml` file. Looking at our 
supported indexer must of them don't need a pre-defined schema (Kafka, 
rabbitmq, elastic-rest, elastic, etc.). For Solr, a schema is normally required 
(although recent versions do have a [schemaless 
mode](https://lucene.apache.org/solr/guide/7_0/schemaless-mode.html#using-the-schemaless-example).
  The issue boils down to how the user starts Solr.
   
   If a Solr instance was started without schemaless mode then it will fail to 
index the data in which case there is no clear way of knowing which fields to 
add in the schema other than trial&error (or checking each enabled plugin)
   
   If the user starts Solr with the schemaless mode, the schema is not entirely 
needed.
   
   I do recognize that this file is used by a single indexer plugin which is a 
conditional dependency (using or not Solr). We _could_ move the config file 
into the plugin directory (`src/plugin/indexer-solr`) and update the 
documentation/README making the file available on the distribution (otherwise 
the user will have to download it). It is possible that putting it in this new 
location makes it clear that it is not used by Nutch itself but provided for 
the Solr users.
   
   The `schema.xml` file is not required but provided as a commodity/guide for 
Solr users.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove obsolete index-writer configuration in conf/
> ---------------------------------------------------
>
>                 Key: NUTCH-2654
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2654
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Assignee: Roannel Fernández Hernández
>            Priority: Major
>             Fix For: 1.16
>
>
> The configuration folder conf/ still contains stuff obsolete after NUTCH-1480:
> - properties to configure indexer plugins in nutch-default.xml
> - solrindex-mapping.xml (looks like obsolete)
> - (still read) elasticsearch.conf
> All obsolete files and properties should be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to