Re: Solrcas questions

Jörn Kottmann Wed, 09 Feb 2011 07:13:27 -0800

On 2/9/11 3:47 PM, Tommaso Teofili wrote:

regarding asserts in the initialize() method they can be safely removed as they 
were put there mainly for debugging purpose, however the initialization of the 
Consumer would fail if such params are null or badly defined as you can see 
inside the createServer(type,path) and inside the 
FieldMappingReader.getConf(path) methods


Lets open a jira for this one.

the cas element in the mapping file is an optional one and I thought it was 
useful to track the cas which delivered information, in the sample file it gets 
mapped inside an id field but it doesn't mean it MUST be unique; however that 
is optional and maybe the toString() method isn't the best one to store the cas 
information, but I still think it makes sense to not loose such an information.

I believe in the very most cases it is really not unique. People canhave a FS in the cas which contains a unique id, thatcan be easily mapped to an id field in solr. The current implementationcan do that already. I also believe thetoString value it not all helpful to debug anything. You might want tolog debug information into the CAS.If you wish to keep that in solr, it would be possible to simply mapthese FSes.

I agree with the need to switch to the CAS API

Then lets open a jira for it.

I agree also regarding the enhancing the exception handling for debugging 
errors; if commit fails I think that should be handled the same way as an add() 
fails otherwise it should be created a commit policy (i.e. a cache of documents 
previously added to try to re-send them) parameter but I think it's out of the 
scope of a basic Solrcas implementation and more related to how Solr handles 
commit errors
I'd introduce the already discussed autocommit configuration parameter 
(boolean) to indicate if Solrcas should also send a commit to the SolrServer 
(it may also make sense to create a third value for this param called 'destroy' 
that would trigger the commit only on the destroy() method even if in that case 
any errors during the commit could not be recovered)

When there is not a unique id the document will be added again into solrwhen commit failed the first time. Not sure what is thebest way to handle these errors. In some cases you might just want toignore it, in other you might want to retry. I also wonder ifautocommit is not the best option when there is a massive amount ofdocuments streamed to solr from multiple

uima pipelines. Do you have some experience here ?

regarding the EmbeddedSolrServer I agree that it's generally not a top option 
in production but I am working now with a Solr project where network latency 
has a significance impact (being Solr the best solution anyways) and I'd get a 
considerable advantage if I can query it avoiding HTTP requests that way, 
however since the main way to query Solr is via REST calls I have no objections 
removing it

Sounds good, lets use it for testing only. We also need to enhance thetest. We should add a document and then retrieve

it to see that it is in solr as expected.

Do you want to open the jiras yourself ?

Jörn

Re: Solrcas questions

Reply via email to