[ 
https://issues.apache.org/jira/browse/SOLR-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883440#comment-15883440
 ] 

Torsten Bøgh Köster commented on SOLR-9887:
-------------------------------------------

As a co-author of the said project, I'm happy to see that a discussion has 
started. We're currently implementing another search based on Solr where we're 
heavily making use of a lot of huge synonym lists (e.g. for german stemming). 
The problem is that the only out of the box way to use large synonym files with 
Solr is to package them as JAR and supply them in the classpath or the external 
libs folder. 

As Jan said, Zookeeper would be an ideal storage but is limited to 1mb and you 
do not want to mess around with that. I like the idea Alexandre that Solr 
should maintain resources in a push fashion and act as a pure data store. Is 
there a way that we push large synonym files into the system collection (that 
would be my Option 3 ;-)?

In the current project jdbc storage is not the preferred way of handling data. 
So we're maybe going to extend the project to another NoSQL datastore - or even 
the system collection as mentioned above. The main implementation idea of the 
solr-jdbc project is to swap the ResourceLoader with a datastore dependend one 
[1]. I'll check if we could design this more interchangeable for future use of 
other data stores or the native system collection.

In regards of updating, the solr-jdbc project is pulling updated synonym 
definitions upon Searcher construction, so there is no in-between Searcher 
synonym reloading - but it would be certainly be a nice to have feature.

[1] 
https://github.com/shopping24/solr-jdbc/blob/master/src/main/java/com/s24/search/solr/analysis/jdbc/JdbcResourceLoader.java

> Add KeepWordFilter, StemmerOverrideFilter, StopFilterFactory, SynonymFilter 
> that reads data from a JDBC source
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9887
>                 URL: https://issues.apache.org/jira/browse/SOLR-9887
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tobias Kässmann
>            Priority: Minor
>
> We've created some new {{FilterFactories}} that reads their stopwords or 
> synonyms from a database (by a JDBC source). That enables us a easy 
> management of large lists and also add the possibility to do this in other 
> tools. JDBC data sources are retrieved via JNDI.
> For a easy reload of this lists we've added a {{SeacherAwareReloader}} 
> abstraciton that reloads this lists on every new searcher event.
> If this is a feature that is interesting for Solr, we will create a pull 
> request. All the sources are currently available here: 
> https://github.com/shopping24/solr-jdbc



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to