Should I write a new Documentum Connector with our specific use-case to go forward ? I guess your book will be helpful to understand connector framework in manifoldcf.
On Wed, Mar 28, 2012 at 7:02 PM, Karl Wright <daddy...@gmail.com> wrote: > Right, LUCENE never did allow you to modify a document's indexes, only > replace them. What I'm trying to tell you is that there is no reason > to have the same document ID for both documents. ManifoldCF will > support treating the XML document and PDF document as different > documents in Solr. But if you want them to in fact be the same > document, just combined in some way, neither ManifoldCF nor Lucene > will support that at this time. > > Karl > > > On Wed, Mar 28, 2012 at 9:09 AM, Anupam Bhattacharya > <anupam...@gmail.com> wrote: > > I saw that the index getting created by 1st PDF indexing job which worked > > perfectly well for a particular id. Later when i ran the 2nd XML indexing > > Job for the same id. I lost all field indexed by the 1st job and i was > left > > out with field indexes created my this 2nd job. > > > > I thought that it would combine field values for a specified doc id. > > > > As per Lucene developers they mention that by design Lucene doesn't > support > > this. > > > > Pls. see following url :: > > https://issues.apache.org/jira/browse/LUCENE-3837 > > > > -Anupam > > > > > > On Wed, Mar 28, 2012 at 6:15 PM, Karl Wright <daddy...@gmail.com> wrote: > >> > >> The Solr handler that you are using should not matter here. > >> > >> Can you look at the Simple History report, and do the following: > >> > >> - Look for a document that is being indexed in both PDF and XML. > >> - Find the "ingestion" activity for that document for both PDF and XML > >> - Compare the ID's (which for the ingestion activity are the URL's of > >> the documents in Webtop) > >> > >> If the URLs are in fact different, then you should be able to make > >> this work. You need to look at how you configured your Solr instance, > >> and which fields you are specifying in your Solr output connection. > >> You want those Webtop urls to be indexed as the unique document > >> identifier in Solr, not some other ID. > >> > >> Thanks, > >> Karl > >> > >> > >> On Wed, Mar 28, 2012 at 7:19 AM, Anupam Bhattacharya > >> <anupam...@gmail.com> wrote: > >> > Today I ran 2 job one by one but it seems since we are using > >> > /update/extract Request Handler the field values for common id gets > >> > overridden by the latest job. I want to update certain field in the > >> > lucene indexes for the doc rather than completely update with new > >> > values and by loosing other field value entries. > >> > > >> > On Tue, Mar 27, 2012 at 11:13 PM, Karl Wright <daddy...@gmail.com> > >> > wrote: > >> >> For Documentum, content length is in bytes, I believe. It does not > >> >> set the length, it filters out all documents greater than the > >> >> specified length. Leaving the field blank will perform no filtering. > >> >> > >> >> Document types in Documentum are specified by mime type, so you'd > want > >> >> to select all that apply. The actual one used will depend on how > your > >> >> particular instance of Documentum is configured, but if you pick them > >> >> all you should have no problem. > >> >> > >> >> Karl > >> >> > >> >> > >> >> On Tue, Mar 27, 2012 at 1:39 PM, Anupam Bhattacharya > >> >> <anupam...@gmail.com> wrote: > >> >>> Thanks!! Seems from your explanation that i can update same > documents > >> >>> other > >> >>> field values. I inquired about this because I have two different > >> >>> document > >> >>> with a parent-child relationship which needs to be indexed as one > >> >>> document > >> >>> in lucene index. > >> >>> > >> >>> As you must have understood by now that i am trying to do this for > >> >>> Documentum CMS. I have seen the configuration screen for setting the > >> >>> Content > >> >>> length & second for filtering document type. So my question is what > >> >>> unit the > >> >>> Content length accepts values (bit,bytes,KB,MB etc) & whether this > >> >>> configuration set the lengths for documents full text indexing ?. > >> >>> > >> >>> Additionally to scan only one kind of document e.g PDF what should > be > >> >>> added > >> >>> to filter those documents? is it application/pdf OR PDF ? > >> >>> > >> >>> Regards > >> >>> Anupam > >> >>> > >> >>> > >> >>> On Tue, Mar 27, 2012 at 10:55 PM, Karl Wright <daddy...@gmail.com> > >> >>> wrote: > >> >>>> > >> >>>> The document key in Solr is the url of the document, as constructed > >> >>>> by > >> >>>> the connector you are using. If you are using the same document to > >> >>>> construct two different Solr documents, ManifoldCF by definition > >> >>>> cannot be aware of this. But if these are different files from the > >> >>>> point of view of ManifoldCF they will have different URLs and be > >> >>>> treated differently. The jobs can overlap in this case with no > >> >>>> difficulty. > >> >>>> > >> >>>> Karl > >> >>>> > >> >>>> On Tue, Mar 27, 2012 at 1:08 PM, Anupam Bhattacharya > >> >>>> <anupam...@gmail.com> wrote: > >> >>>> > I want to configure two jobs to index in SOLR using ManifoldCF > >> >>>> > using > >> >>>> > /extract/update requestHandler. > >> >>>> > 1st to synchronize only XML files & 2nd to synchronize the PDF > >> >>>> > file. > >> >>>> > If both these document share a unique id. Can i combine the > indexes > >> >>>> > for > >> >>>> > both > >> >>>> > in 1 SOLR schema without overriding the details added by previous > >> >>>> > job. > >> >>>> > > >> >>>> > suppose, > >> >>>> > xmldoc indexes field0(id), field1, field2, field3 > >> >>>> > & pdfdoc indexes field0(id), field4, field5, field6. > >> >>>> > > >> >>>> > Output docindex ==> (xml+pdf doc), field0(id), field1, field2, > >> >>>> > field3, > >> >>>> > field4, field5, field6 > >> >>>> > > >> >>>> > Regards > >> >>>> > Anupam > >> >>>> > > >> >>>> > > >> >>> > >> >>> > >> >>> > >> >>> > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards > >> > Anupam Bhattacharya > > > > > > > > > > -- > > Thanks & Regards > > Anupam Bhattacharya > > > > > -- Thanks & Regards Anupam Bhattacharya