I saw that the index getting created by 1st PDF indexing job which worked perfectly well for a particular id. Later when i ran the 2nd XML indexing Job for the same id. I lost all field indexed by the 1st job and i was left out with field indexes created my this 2nd job.
I thought that it would combine field values for a specified doc id. As per Lucene developers they mention that by design Lucene doesn't support this. Pls. see following url :: https://issues.apache.org/jira/browse/LUCENE-3837 -Anupam On Wed, Mar 28, 2012 at 6:15 PM, Karl Wright <daddy...@gmail.com> wrote: > The Solr handler that you are using should not matter here. > > Can you look at the Simple History report, and do the following: > > - Look for a document that is being indexed in both PDF and XML. > - Find the "ingestion" activity for that document for both PDF and XML > - Compare the ID's (which for the ingestion activity are the URL's of > the documents in Webtop) > > If the URLs are in fact different, then you should be able to make > this work. You need to look at how you configured your Solr instance, > and which fields you are specifying in your Solr output connection. > You want those Webtop urls to be indexed as the unique document > identifier in Solr, not some other ID. > > Thanks, > Karl > > > On Wed, Mar 28, 2012 at 7:19 AM, Anupam Bhattacharya > <anupam...@gmail.com> wrote: > > Today I ran 2 job one by one but it seems since we are using > > /update/extract Request Handler the field values for common id gets > > overridden by the latest job. I want to update certain field in the > > lucene indexes for the doc rather than completely update with new > > values and by loosing other field value entries. > > > > On Tue, Mar 27, 2012 at 11:13 PM, Karl Wright <daddy...@gmail.com> > wrote: > >> For Documentum, content length is in bytes, I believe. It does not > >> set the length, it filters out all documents greater than the > >> specified length. Leaving the field blank will perform no filtering. > >> > >> Document types in Documentum are specified by mime type, so you'd want > >> to select all that apply. The actual one used will depend on how your > >> particular instance of Documentum is configured, but if you pick them > >> all you should have no problem. > >> > >> Karl > >> > >> > >> On Tue, Mar 27, 2012 at 1:39 PM, Anupam Bhattacharya > >> <anupam...@gmail.com> wrote: > >>> Thanks!! Seems from your explanation that i can update same documents > other > >>> field values. I inquired about this because I have two different > document > >>> with a parent-child relationship which needs to be indexed as one > document > >>> in lucene index. > >>> > >>> As you must have understood by now that i am trying to do this for > >>> Documentum CMS. I have seen the configuration screen for setting the > Content > >>> length & second for filtering document type. So my question is what > unit the > >>> Content length accepts values (bit,bytes,KB,MB etc) & whether this > >>> configuration set the lengths for documents full text indexing ?. > >>> > >>> Additionally to scan only one kind of document e.g PDF what should be > added > >>> to filter those documents? is it application/pdf OR PDF ? > >>> > >>> Regards > >>> Anupam > >>> > >>> > >>> On Tue, Mar 27, 2012 at 10:55 PM, Karl Wright <daddy...@gmail.com> > wrote: > >>>> > >>>> The document key in Solr is the url of the document, as constructed by > >>>> the connector you are using. If you are using the same document to > >>>> construct two different Solr documents, ManifoldCF by definition > >>>> cannot be aware of this. But if these are different files from the > >>>> point of view of ManifoldCF they will have different URLs and be > >>>> treated differently. The jobs can overlap in this case with no > >>>> difficulty. > >>>> > >>>> Karl > >>>> > >>>> On Tue, Mar 27, 2012 at 1:08 PM, Anupam Bhattacharya > >>>> <anupam...@gmail.com> wrote: > >>>> > I want to configure two jobs to index in SOLR using ManifoldCF using > >>>> > /extract/update requestHandler. > >>>> > 1st to synchronize only XML files & 2nd to synchronize the PDF file. > >>>> > If both these document share a unique id. Can i combine the indexes > for > >>>> > both > >>>> > in 1 SOLR schema without overriding the details added by previous > job. > >>>> > > >>>> > suppose, > >>>> > xmldoc indexes field0(id), field1, field2, field3 > >>>> > & pdfdoc indexes field0(id), field4, field5, field6. > >>>> > > >>>> > Output docindex ==> (xml+pdf doc), field0(id), field1, field2, > field3, > >>>> > field4, field5, field6 > >>>> > > >>>> > Regards > >>>> > Anupam > >>>> > > >>>> > > >>> > >>> > >>> > >>> > > > > > > > > -- > > Thanks & Regards > > Anupam Bhattacharya > -- Thanks & Regards Anupam Bhattacharya