(1) bq: I expect solr to index null date... It's quite unlikely that'll happen. Bad data is bad data. It'd be a horrible problem to track down, "I know I indexed a document dated yesterday but I can't find it". You'd have to look in your logs to see and try to reconstruct the data flow, with any kind of automation that would be very difficult operationally. "Fail early fail often" is the motto.
(2) Arguably Solr should index everything it _can_ in a packet, but currently Solr stops at the first error and returns. This has been around for quite a while. When indexing a bunch of documents at once, the behavior is as you describe Solr gives up and doesn't continue trying to index more documents. There is a patch hanging around somewhere that's never been committed to at least return information about the problematic document. (3) This is a puzzler. The only thing I can think of is that you haven't committed the document (hard commit, openSearcher can be true or false). Then you're killing Solr abnormally, and upon restart it's replaying the transaction log. For that to be true, you have to be killing Solr and restarting it, _then_ seeing an attempt to re-index. If that's not the sequence, I'm clueless. Best, Erick On Sun, Aug 17, 2014 at 12:22 AM, Denis Shishlyannikoc (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Denis Shishlyannikoc updated SOLR-6385: > --------------------------------------- > > Description: > Hello. > I try to work with solr lately and did not get much experience with it yet, > so part of problems that I will describe here can be due to lack of knowledge. > Excuse me for that. > > Problems that I saw: > > 1) I use solj to index collection of SolrInputDocuments. > To do it I call method add(Collection) of CloudSolrServer object. > Just for fun I tried to index one of documents with not correct date: > I took solr valid date value of one of these SolrInputDocuments and changed > the "T" symbol in it to "K". > (this date is defined in schema.xml as > <field name="mydate" type="tdate" indexed="true" stored="true" > multiValued="false" /> ) > Solr failed to index collection and returned SolrServerException. > > Also what happened above is that part of documents of this SolrInputDocuments > collection got indexed correctly, problematic date document failed to be > indexed together with several valid (from all points of view) > SolrInputDocuments of this collection. > Looks like solr went through documents in collection, indexing them one by > one, trowed exception on problematic date document and finally did not index > all valid documents that were after problematic date document. > > > 2) After failure, described in 1), solr kept problematic date document in > some queue and tried to reindex this document again (attempt per some 3-5 > minutes, did not measure exact time of that), showing same (failed to parse > date) exception in logs! After solr server restart issue is gone: no more > tries to reindex problematic date document. > > > Questions to be answered > > 1) What is the default behavior of solr on indexing problematic values fields? > For example for date field: I expect solr to index null date (instead of not > indexing of whole document) and then write some warning to logs and return > some indication of problem on UpdateResponse. > Maybe solr behavior on not valid field values should be configurable (defined > in some xml element in schema). > > 2) While indexing collection of documents, should solr index all valid > documents (and not return on first problem as it happens now) ? > If I index collection of documents, I expect solr to index all valid (from > all points of view) documents and return indexing status on UpdateResponse > about all not indexed problematic documents. > > 3) Why solr tries to reindex problematic document? Looks like bug that can > create useless load on server. > If this behavior is planned by design, then how can I force solr to stop > reindexing such problem documents (without restarting of solr server)? > Where can I read about it? > > Thank you. > > was: > Hello. > I try to work with solr lately and did not get much experience with it yet, > so part of problems that I will describe here can be due to lack of knowledge. > Excuse me for that. > > Problems that I saw: > > 1) I use solj to index collection of SolrInputDocuments. > To do it I call method add(Collection) of CloudSolrServer object. > Just for fun I tried to index one of documents with not correct date: > I took solr valid date value of one of these SolrInputDocuments and changed > the "T" symbol in it to "K". > (this date is defined in schema.xml as > <field name="mydate" type="tdate" indexed="true" stored="true" > multiValued="false" /> ) > Solr failed to index collection and returned SolrServerException. > > Also what happened above is that part of documents of this SolrInputDocuments > collection got indexed correctly, problematic date document failed to be > indexed together with several valid (from all points of view) > SolrInputDocuments of this collection. > Looks like solr went through documents in collection, indexing them one by > one, trowed exception on problematic date document and finally did not index > all valid documents that were after problematic date document. > > > 2) After failure, described in 1), solr kept problematic date document in > some queue and tried to reindex this document again (attempt per some 3-5 > minutes, did not measure exact time of that), showing same (failed to parse > date) exception in logs! After solr server restart issue is gone: no more > tries to reindex problematic date document. > > > Questions to be answered > > 1) What is the default behavior of solr on indexing problematic values fields? > For example for date field: I expect solr to index null date (instead of not > indexing of whole document) and then write some warning to logs and return > some indication of problem on UpdateResponse. > Maybe solr behavior on not valid field values should be configurable (defined > in some xml element in schema). > > 2) While indexing collection of documents, should solr index all valid > documents (and not return on first problem as it happens now) ? > If I index collection of documents, I expect solr to index all valid (from > all points of view) documents and return indexing status on UpdateResponse > about all not indexed problematic documents. > > 3) Why solr tries to reindex problematic document? Looks like bug that can > create useless load on server. > If this behavior is planned by design, then how can I force solr to stop > reindexing such problem documents (without restarting of solr server)? > Where can I read about it? > > Thank you. > > > Thank you. > > >> Strange behavior on indexing document with wrong date format >> ------------------------------------------------------------ >> >> Key: SOLR-6385 >> URL: https://issues.apache.org/jira/browse/SOLR-6385 >> Project: Solr >> Issue Type: Bug >> Components: clients - java >> Affects Versions: 4.7.2 >> Environment: Solr server in Windows 7, solrj >> Reporter: Denis Shishlyannikoc >> Priority: Critical >> >> Hello. >> I try to work with solr lately and did not get much experience with it yet, >> so part of problems that I will describe here can be due to lack of >> knowledge. >> Excuse me for that. >> Problems that I saw: >> 1) I use solj to index collection of SolrInputDocuments. >> To do it I call method add(Collection) of CloudSolrServer object. >> Just for fun I tried to index one of documents with not correct date: >> I took solr valid date value of one of these SolrInputDocuments and changed >> the "T" symbol in it to "K". >> (this date is defined in schema.xml as >> <field name="mydate" type="tdate" indexed="true" stored="true" >> multiValued="false" /> ) >> Solr failed to index collection and returned SolrServerException. >> Also what happened above is that part of documents of this >> SolrInputDocuments collection got indexed correctly, problematic date >> document failed to be indexed together with several valid (from all points >> of view) SolrInputDocuments of this collection. >> Looks like solr went through documents in collection, indexing them one by >> one, trowed exception on problematic date document and finally did not index >> all valid documents that were after problematic date document. >> 2) After failure, described in 1), solr kept problematic date document in >> some queue and tried to reindex this document again (attempt per some 3-5 >> minutes, did not measure exact time of that), showing same (failed to parse >> date) exception in logs! After solr server restart issue is gone: no more >> tries to reindex problematic date document. >> Questions to be answered >> 1) What is the default behavior of solr on indexing problematic values >> fields? >> For example for date field: I expect solr to index null date (instead of not >> indexing of whole document) and then write some warning to logs and return >> some indication of problem on UpdateResponse. >> Maybe solr behavior on not valid field values should be configurable >> (defined in some xml element in schema). >> 2) While indexing collection of documents, should solr index all valid >> documents (and not return on first problem as it happens now) ? >> If I index collection of documents, I expect solr to index all valid (from >> all points of view) documents and return indexing status on UpdateResponse >> about all not indexed problematic documents. >> 3) Why solr tries to reindex problematic document? Looks like bug that can >> create useless load on server. >> If this behavior is planned by design, then how can I force solr to stop >> reindexing such problem documents (without restarting of solr server)? >> Where can I read about it? >> Thank you. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
