[ 
https://issues.apache.org/jira/browse/SOLR-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126235#comment-13126235
 ] 

Jan Høydahl commented on SOLR-2823:
-----------------------------------

Hey guys, you're jumping fast here :)

Erik, you must have peeked in my ideas book because exactly what you propose is 
something I planned to introduce later, but using Groovy as the DSL :) - much 
like Gradle does. I think this could be achieved by making 
UpdateProcessorChains pluggable and definable in solrconfig. The 
DefaultUpdateProcessorChain could be the simple linear array[] of processors. 
The ScriptedUpdateProcessorChain would be the powerhouse where you could do 
both simple linear ones as well as complex logic. You can even do simple 
document manipulation inline without calling a processor, such as 
doc.deleteField("title")...

This approach also solves another wish of mine, namely being able to define 
chains outside of solrconfig.xml. Logically, configuring schema and document 
processing is done by a "content" guy, but configuring solrconfig.xml is done 
by the "hardware/operations" guys. Imagine a solr/conf/pipeline.groovy defined 
in solrconfig.xml:

{code:xml}
<updateProcessorChain class="solr.ScriptedUpdateProcessorChainFactory" 
file="pipeline.groovy" />
{code}

pipeline.groovy:
{code}
chain simple {
  process(langid)
  process(copyfield)
  chain(logAndRun)
}

chain moreComplex {
  process(langid)
  if(doc.getFieldValue("employees") > 10)
    process(copyfield)
  else
    chain(myOtherProcesses)
  doc.deleteField("title")
  chain(logAndRun)
}

chain logAndRun {
  process(log)
  process(run)
}

processor langid {
  class = "solr.LanguageIdentifierUpdateProcessorFactory"
  config("langid.fl", "title,body")
  config("langid.langField", "language")
  config("map", true)
}

processor copyfield {
  script = "copyfield.groovy"
  config("from", "title")
  config("to", "title_en")
}
{code}

I don't know what it takes to code such a thing, but if we had it, I'd never go 
back to defining pipelines in XML :)
                
> Re-use of UpdateProcessor configurations in multiple UpdateChains
> -----------------------------------------------------------------
>
>                 Key: SOLR-2823
>                 URL: https://issues.apache.org/jira/browse/SOLR-2823
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>            Priority: Minor
>
> When dealing with multiple UpdateChains and Processors, you frequently need 
> to re-use configuration. Two chains may be equal except for one config 
> setting in one <processor>.
> I propose to allow named processor configs, which can be referenced by name 
> in the chains.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to