this patch is created from 1.3 (may apply on trunk also)
--Noble

On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> I guess it is a threading problem. I can give you a patch. you can raise a bug
> --Noble
>
> On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote:
>>
>> As a follow up: I continued tweaking the data-config.xml, and have been able
>> to make the commit fail with as little as 3 fields in the sdc.xml, with only
>> one multivalued field. Even more strange, some fields work and some do not.
>> For instance, in my dc.xml:
>>
>> <field column="Taxon"
>> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon"
>> />
>> .
>> .
>> .
>> <field column="GenPept"
>> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept"
>> />
>>
>> and in the schema.xml:
>> <field name="GenPept" type="text" indexed="true" stored="false"
>> multiValued="true" />
>> .
>> .
>> .
>> <field name="Taxon" type="text" indexed="true" stored="false"
>> multiValued="true" />
>> but taxon works and genpept does not. What could possibly account for this
>> discrepancy? Again, the error logs from the server are exactly that seen in
>> the first post.
>>
>> What is going on?
>>
>>
>> KyleMorrison wrote:
>>>
>>> Yes, this is the most recent version of Solr, stream="true" and stopwords,
>>> lowercase and removeDuplicate being applied to all multivalued fields?
>>> Would the filters possibly be causing this? I will not use them and see
>>> what happens.
>>>
>>> Kyle
>>>
>>>
>>> Shalin Shekhar Mangar wrote:
>>>>
>>>> Hmm, strange.
>>>>
>>>> This is Solr 1.3.0, right? Do you have any transformers applied to these
>>>> multi-valued fields? Do you have stream="true" in the entity?
>>>>
>>>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>>
>>>>> I apologize for spamming this mailing list with my problems, but I'm at
>>>>> my
>>>>> wits end. I'll get right to the point.
>>>>>
>>>>> I have an xml file which is ~1GB which I wish to index. If that is
>>>>> successful, I will move to a larger file of closer to 20GB. However,
>>>>> when I
>>>>> run my data-config(let's call it dc.xml) over it, the import only
>>>>> manages
>>>>> to
>>>>> get about 27 rows, out of roughly 200K. The exact same
>>>>> data-config(dc.xml)
>>>>> works perfectly on smaller data files of the same type.
>>>>>
>>>>> This data-config is quite large, maybe 250 fields. When I run a smaller
>>>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>>>>> perfectly. The only conclusion I can draw from this is that the
>>>>> data-config
>>>>> method just doesn't scale well.
>>>>>
>>>>> When the dc.xml fails, the server logs spit out:
>>>>>
>>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>>>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>>>> status=0
>>>>> QTime=95
>>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>>>>> doFullImport
>>>>> INFO: Starting Full Import
>>>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>>>>> deleteAll
>>>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>>>>> doFullImport
>>>>> SEVERE: Full Import failed
>>>>> java.util.ConcurrentModificationException
>>>>>        at
>>>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>>>        at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>>>        at
>>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>>>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>>>> status=0
>>>>> QTime=77
>>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>>>>> doFullImport
>>>>> INFO: Starting Full Import
>>>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>>>>> deleteAll
>>>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>>>>> doFullImport
>>>>> SEVERE: Full Import failed
>>>>> java.util.ConcurrentModificationException
>>>>>        at
>>>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>>>        at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>>>        at
>>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>>>        at
>>>>>
>>>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>>>>
>>>>> This mass of exceptions DOES NOT occur when I perform the same
>>>>> full-import
>>>>> with sdc.xml. As far as I can tell, the only difference between the two
>>>>> files is the amount of fields they contain.
>>>>>
>>>>> Any guidance or information would be greatly appreciated.
>>>>> Kyle
>>>>>
>>>>>
>>>>> PS The schema.xml in use specifies almost all fields as multivalued, and
>>>>> has
>>>>> a copyfield for almost every field. I can fix this if it is causing my
>>>>> problem, but I would prefer not to.
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19749991.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul
Index: contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java
===================================================================
--- contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java	(revision 696558)
+++ contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/XPathRecordReader.java	(working copy)
@@ -146,7 +146,7 @@
           }
           if (event == END_ELEMENT) {
             if (isRecord)
-              handler.handle(new HashMap<String, Object>(values), forEachPath);
+              handler.handle(getDeepCopy(values), forEachPath);
             if (recordStarted && !isRecord
                     && !childrenFound.containsAll(childNodes)) {
               for (Node n : childNodes) {
@@ -316,6 +316,18 @@
     }
   }
 
+  private Map<String, Object> getDeepCopy(Map<String, Object> values) {
+    Map<String, Object> result = new HashMap<String, Object>();
+    for (Map.Entry<String, Object> entry : values.entrySet()) {
+      if (entry.getValue() instanceof List) {
+        result.put(entry.getKey(),new ArrayList((List) entry.getValue()));
+      } else{
+        result.put(entry.getKey(),entry.getValue());
+      }
+    }
+    return result;
+  }
+
   static XMLInputFactory factory = XMLInputFactory.newInstance();
 
   public static interface Handler {

Reply via email to