this patch is created from 1.3 (may apply on trunk also) --Noble On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള് नोब्ळ् <[EMAIL PROTECTED]> wrote: > I guess it is a threading problem. I can give you a patch. you can raise a bug > --Noble > > On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: >> >> As a follow up: I continued tweaking the data-config.xml, and have been able >> to make the commit fail with as little as 3 fields in the sdc.xml, with only >> one multivalued field. Even more strange, some fields work and some do not. >> For instance, in my dc.xml: >> >> <field column="Taxon" >> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon" >> /> >> . >> . >> . >> <field column="GenPept" >> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept" >> /> >> >> and in the schema.xml: >> <field name="GenPept" type="text" indexed="true" stored="false" >> multiValued="true" /> >> . >> . >> . >> <field name="Taxon" type="text" indexed="true" stored="false" >> multiValued="true" /> >> but taxon works and genpept does not. What could possibly account for this >> discrepancy? Again, the error logs from the server are exactly that seen in >> the first post. >> >> What is going on? >> >> >> KyleMorrison wrote: >>> >>> Yes, this is the most recent version of Solr, stream="true" and stopwords, >>> lowercase and removeDuplicate being applied to all multivalued fields? >>> Would the filters possibly be causing this? I will not use them and see >>> what happens. >>> >>> Kyle >>> >>> >>> Shalin Shekhar Mangar wrote: >>>> >>>> Hmm, strange. >>>> >>>> This is Solr 1.3.0, right? Do you have any transformers applied to these >>>> multi-valued fields? Do you have stream="true" in the entity? >>>> >>>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> >>>>> I apologize for spamming this mailing list with my problems, but I'm at >>>>> my >>>>> wits end. I'll get right to the point. >>>>> >>>>> I have an xml file which is ~1GB which I wish to index. If that is >>>>> successful, I will move to a larger file of closer to 20GB. However, >>>>> when I >>>>> run my data-config(let's call it dc.xml) over it, the import only >>>>> manages >>>>> to >>>>> get about 27 rows, out of roughly 200K. The exact same >>>>> data-config(dc.xml) >>>>> works perfectly on smaller data files of the same type. >>>>> >>>>> This data-config is quite large, maybe 250 fields. When I run a smaller >>>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >>>>> perfectly. The only conclusion I can draw from this is that the >>>>> data-config >>>>> method just doesn't scale well. >>>>> >>>>> When the dc.xml fails, the server logs spit out: >>>>> >>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >>>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>>>> status=0 >>>>> QTime=95 >>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >>>>> doFullImport >>>>> INFO: Starting Full Import >>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >>>>> deleteAll >>>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >>>>> doFullImport >>>>> SEVERE: Full Import failed >>>>> java.util.ConcurrentModificationException >>>>> at >>>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>>> at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>>> at >>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >>>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>>>> status=0 >>>>> QTime=77 >>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >>>>> doFullImport >>>>> INFO: Starting Full Import >>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >>>>> deleteAll >>>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >>>>> doFullImport >>>>> SEVERE: Full Import failed >>>>> java.util.ConcurrentModificationException >>>>> at >>>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>>> at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>>> at >>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>>> at >>>>> >>>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>>>> >>>>> This mass of exceptions DOES NOT occur when I perform the same >>>>> full-import >>>>> with sdc.xml. As far as I can tell, the only difference between the two >>>>> files is the amount of fields they contain. >>>>> >>>>> Any guidance or information would be greatly appreciated. >>>>> Kyle >>>>> >>>>> >>>>> PS The schema.xml in use specifies almost all fields as multivalued, and >>>>> has >>>>> a copyfield for almost every field. I can fix this if it is causing my >>>>> problem, but I would prefer not to. >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html >>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shalin Shekhar Mangar. >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19749991.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul >
-- --Noble Paul
Index: contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java =================================================================== --- contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java (revision 696558) +++ contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java (working copy) @@ -146,7 +146,7 @@ } if (event == END_ELEMENT) { if (isRecord) - handler.handle(new HashMap<String, Object>(values), forEachPath); + handler.handle(getDeepCopy(values), forEachPath); if (recordStarted && !isRecord && !childrenFound.containsAll(childNodes)) { for (Node n : childNodes) { @@ -316,6 +316,18 @@ } } + private Map<String, Object> getDeepCopy(Map<String, Object> values) { + Map<String, Object> result = new HashMap<String, Object>(); + for (Map.Entry<String, Object> entry : values.entrySet()) { + if (entry.getValue() instanceof List) { + result.put(entry.getKey(),new ArrayList((List) entry.getValue())); + } else{ + result.put(entry.getKey(),entry.getValue()); + } + } + return result; + } + static XMLInputFactory factory = XMLInputFactory.newInstance(); public static interface Handler {