DIH with multi-threading throws exception
-----------------------------------------
Key: SOLR-3314
URL: https://issues.apache.org/jira/browse/SOLR-3314
Project: Solr
Issue Type: Bug
Components: contrib - DataImportHandler
Affects Versions: 3.6
Reporter: Bernd Fehling
Assignee: James Dyer
Fix For: 3.6
While loading with DIH in multi-threading mode there are sometimes exceptions.
{code}
Apr 4, 2012 10:19:10 AM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList
cannot be cast to java.lang.String
at org.apache.solr.common.util.NamedList.getName(NamedList.java:131)
at org.apache.solr.common.util.NamedList.toString(NamedList.java:258)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
at
org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:78)
at
org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:53)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at
org.apache.solr.handler.dataimport.DataImporter$3.run(DataImporter.java:426)
Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
{code}
Analysis:
After loading the LogUpdateProcessor produces the logs by writing the content
of "toLog" and the elapsed time.
{code}
log.info( "" + toLog + " 0 " + (elapsed) );
{code}
"toLog" is a NamedList of org.apache.solr.common.util.NamedList which will be
prepared for printing with methods "toString", "getName" and "getVal". The
NamedList consists of name/value pairs, where the name must always be a String.
As the exceptions points out it somehow happens that the name can be an
ArrayList.
To trace this further down I modified org.apache.solr.common.util.NamedList the
method "getName" as following:
{code}
public String getName(int idx) {
if (nvPairs.get(idx <<
1).getClass().getName().equals("java.util.ArrayList")) {
System.out.println( "<Object>>" + nvPairs.get(idx << 1).toString() + "<"
);
}
return (String)nvPairs.get(idx << 1);
}
{code}
After several tries I could procude an exception and the output was:
{code}
<Object>>[testdir2_testfile2_record2, testdir2_testfile2_record3,
testdir2_testfile2_record2, testdir2_testfile2_record1,
testdir2_testfile2_record3, testdir2_testfile2_record1,
testdir2_testfile2_record1, testdir2_testfile2_record2, ... (24 adds)]<
{code}
What we see here is:
- we have 2 files in 2 directories each of 3 records but it reports "24 adds",
while the index afterwards only has the 6 records (self-healing by uniq IDs in
the index)
- the record IDs are multiple times in the ArrayList
As a matter of fact something is not thread-safe. The
"LogUpdateProcessorFactory"???
I have no idea how to provide a unit test for this one as it is only in DIH
multi-theading mode and only sometimes.
Nevertheless it would be bad to have a rollback after loading some million
records :-(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]