On Fri, Jun 12, 2009 at 3:16 PM, Reinier van den Born<[email protected]> wrote: > Hi Ard, > > To make sure I understand this correctly. I should do: > > delete; wait; write. > > Currently the cycle time is 5 seconds, which would make it a very slow > process. > Alternatively I could delete all first and then write,
that is what i meant > but that would mean that all content would be gone for 5 seconds. Yes, true > Makes it worthwhile to find a way to write only documents that have been > modified. As i said, the problem does not occur when you just change an existing document: why delete it and add it again? Just override its contents, it is faster, and does not have the issue. > > So how does a tool like Dav2Disk handle this? I don't know the tool. AFAIK, a tool like that is meant for initial importing, not primarily meant for an in production repository > That is certainly not waiting 5 seconds for each file to write. > Nor is it deleting everything first before it writes, or? > Or does it suffer from the same problem and I just never noticed it? As explained i don't know the tool. But, here my suggestion: You shouldn't delete and add documents that haven't been changed: it doesn't make sense Howto avoid: 1) compute simple md5 or some other hash of the documents text before putting it in the repository 2) store the md5 as a property 3) before deleting / adding a document, compute md5 and check if it exists in the repository (simple search) 4) modify changed documents instead of delete/add cycle I am confident this does solve your issue. You can test it first if you want with the 5 sec delay to be sure Regards Ard > > Reinier > > Ard Schrijvers wrote: >> >> Hello Reinier, >> >> On Fri, Jun 12, 2009 at 1:59 PM, Reinier van den >> Born<[email protected]> wrote: >>> >>> Bart, >>> >>> The version of the repository is 1.2.15.1. >>> >>> Btw. I tried deleting before writing. It doesn't make a difference. >> >> This is a known issue, not easy to solve. You have two possible solutions: >> >> 1) instead of a deletion / add cycle you modify an existing document >> 2) to the deletion of the old ones in a seperate cycle, with at least >> a delay of X seconds, where X is the value in your cron configuraiton >> of the indexer.xml >> >> I hope this isn't to much of a problem for you. At least, you can >> check whether my proposed solution works >> >> Regards Ard >> >>> >>> Reinier >>> >>> >>> Bart van der Schans wrote: >>>> >>>> Reinier, >>>> >>>> Which version of the repository are you using? >>>> >>>> Bart >>>> >>>> On Fri, Jun 12, 2009 at 1:14 PM, Reinier van den Born >>>> <[email protected]> wrote: >>>>> >>>>> Hi Jasha, >>>>> >>>>> Rebuilding the index fixed the problem of results not showing up. >>>>> Problem remains that if content is written twice it shows up twice. >>>>> >>>>> Maybe I should delete the existing document before I write it? >>>>> (at the moment I simply overwrite...) >>>>> >>>>> Reinier >>>>> >>>>> >>>>> Jasha Joachimsthal wrote: >>>>>> >>>>>> Hi Reinier, >>>>>> >>>>>> this looks like your Lucene index contains some errors if some results >>>>>> appear twice and others don't appear at all. Try rebuilding the index. >>>>>> >>>>>> Jasha Joachimsthal >>>>>> >>>>>> [email protected] - [email protected] >>>>>> >>>>>> www.onehippo.com >>>>>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466 >>>>>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA >>>>>> 94952 +1 (707) 7734646 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2009/6/11 Reinier van den Born <[email protected]>: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I try to automatically update a collection of documents in a Hippo >>>>>>> repository. >>>>>>> Each document is kept in its own collection within a "main" >>>>>>> collection: >>>>>>> ../1/a.xml, ../2/b.xml, etc. >>>>>>> Each update is independent of earlier ones: I don't need caching, no >>>>>>> JMS, >>>>>>> or >>>>>>> what more. >>>>>>> >>>>>>> So I do a simple scan for old documents (fetchCollection), upload the >>>>>>> new >>>>>>> and delete the old. >>>>>>> Very simple, so I was thinking I could use the Java Adapter >>>>>>> directly... >>>>>>> >>>>>>> Which works except for the getting the scan. Its function is similar >>>>>>> to >>>>>>> "ls >>>>>>> .../*/*.xml". >>>>>>> But my code+DASL gives me a weird response: >>>>>>> - only documents show up that have recently be touched by the CMS >>>>>>> (clicked >>>>>>> on, not necessarily opened) >>>>>>> - the documents I write appear repeated in the list (=duplicates, >>>>>>> each >>>>>>> write >>>>>>> cycle one occurrence is added) >>>>>>> - this duplication is reset when I change the DASL query (eg depth to >>>>>>> 1, >>>>>>> returns no documents, and back to 2). >>>>>>> - all documents are listed correctly by CMS and DAVexplorer, no >>>>>>> problemo. >>>>>>> >>>>>>> I use my own plain WebdavServiceImpl, which I assume does no caching. >>>>>>> Also when I restart my app (tomcat) nothing changes, nor when I >>>>>>> restart >>>>>>> the >>>>>>> repo. >>>>>>> >>>>>>> Anyway, any help is appreciated? See code below. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Reinier >>>>>>> >>>>>>> >>>>>>> ------------------ >>>>>>> Here the code I use: >>>>>>> >>>>>>> ..... >>>>>>> public void hippoInit (Properties props) { >>>>>>> try { >>>>>>> WebdavConfig webdavConfig = new WebdavConfig(props); >>>>>>> webdavService = new WebdavServiceImpl(webdavConfig); >>>>>>> rootPath = webdavService.getBasePath(); >>>>>>> } >>>>>>> catch (Exception e) { >>>>>>> error( "Error initializing Hippo repository connection: >>>>>>> "+e.getMessage()); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> public HashMap hippoScanJobOpenings (String relPath) { >>>>>>> HashMap jobs = new HashMap(); >>>>>>> jobs.put( "REPO.RELPATH", relPath ); >>>>>>> >>>>>>> String query = Interpolation.interpolate( jobsQuery, jobs ); >>>>>>> try { >>>>>>> DocumentCollection coll = webdavService.fetchCollection( >>>>>>> rootPath, >>>>>>> query, false ); >>>>>>> List docs = coll.getDocuments(); >>>>>>> >>>>>>> Iterator iter = docs.iterator(); >>>>>>> while (iter.hasNext()) { >>>>>>> Document collDoc = (Document) iter.next(); >>>>>>> String dirPath = ((DocumentPath) >>>>>>> collDoc.getPath()).getRelativePath(); >>>>>>> message( "Found job: "+dirPath ); >>>>>>> } >>>>>>> } >>>>>>> catch (Exception e) { >>>>>>> error( "Error getting existing job openings: "+e.getMessage()); >>>>>>> } >>>>>>> return jobs; >>>>>>> } >>>>>>> >>>>>>> The DASL query used is: >>>>>>> >>>>>>> <d:searchrequest xmlns:d="DAV:" >>>>>>> xmlns:S="http://jakarta.apache.org/slide/" >>>>>>> xmlns:h="http://hippo.nl/cms/1.0"> >>>>>>> <d:basicsearch> >>>>>>> <d:select> >>>>>>> <d:prop> >>>>>>> <h:caption/> >>>>>>> <d:displayname/> >>>>>>> <h:type/> >>>>>>> <d:modificationdate/> >>>>>>> </d:prop> >>>>>>> </d:select> >>>>>>> <d:from> >>>>>>> <d:scope> >>>>>>> <d:href>${REPO.RELPATH}</d:href> >>>>>>> <d:depth>2</d:depth> >>>>>>> </d:scope> >>>>>>> </d:from> >>>>>>> <d:where> >>>>>>> <d:eq> >>>>>>> <d:prop><h:type/></d:prop> >>>>>>> <d:literal>jobopening</d:literal> >>>>>>> </d:eq> >>>>>>> </d:where> >>>>>>> <d:orderby> >>>>>>> <d:order> >>>>>>> <d:prop><h:modificationDate/></d:prop> >>>>>>> <d:ascending/> >>>>>>> </d:order> >>>>>>> </d:orderby> >>>>>>> </d:basicsearch> >>>>>>> </d:searchrequest> >>>>>>> >>>>>>> Notes: >>>>>>> >>>>>>> - props contains the settings to initialise the WebdavConfig object >>>>>>> as >>>>>>> described in ... >>>>>>> - relPath is the path from rootPath to the collection containing the >>>>>>> documents. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Reinier van den Born >>>>>>> HintTech B.V. >>>>>>> >>>>>>> T: +31(0)88 268 25 00 >>>>>>> F: +31(0)88 268 25 01 >>>>>>> M: +31(0)6 494 171 36 >>>>>>> >>>>>>> Delftechpark 37i | 2628 XJ Delft | The Netherlands >>>>>>> www.hinttech.com >>>>>>> >>>>>>> HintTech is a specialist in eBusiness Technology ( .Net, Java >>>>>>> platform, >>>>>>> Tridion ) and IT-Projects. >>>>>>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. >>>>>>> NL8062.16.396.B01 >>>>>>> >>>>>>> >>>>>>> ******************************************** >>>>>>> Hippocms-dev: Hippo CMS development public mailinglist >>>>>>> >>>>>>> Searchable archives can be found at: >>>>>>> MarkMail: http://hippocms-dev.markmail.org >>>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>>>>>> >>>>>>> >>>>>>> >>>>>> ******************************************** >>>>>> Hippocms-dev: Hippo CMS development public mailinglist >>>>>> >>>>>> Searchable archives can be found at: >>>>>> MarkMail: http://hippocms-dev.markmail.org >>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>>>>> >>>>> -- >>>>> >>>>> Reinier van den Born >>>>> HintTech B.V. >>>>> >>>>> T: +31(0)88 268 25 00 >>>>> F: +31(0)88 268 25 01 >>>>> M: +31(0)6 494 171 36 >>>>> >>>>> Delftechpark 37i | 2628 XJ Delft | The Netherlands >>>>> www.hinttech.com >>>>> >>>>> HintTech is a specialist in eBusiness Technology ( .Net, Java platform, >>>>> Tridion ) and IT-Projects. >>>>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. >>>>> NL8062.16.396.B01 >>>>> >>>>> ******************************************** >>>>> Hippocms-dev: Hippo CMS development public mailinglist >>>>> >>>>> Searchable archives can be found at: >>>>> MarkMail: http://hippocms-dev.markmail.org >>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>>>> >>>>> >>>>> >>>> >>>> >>> -- >>> >>> Reinier van den Born >>> HintTech B.V. >>> >>> T: +31(0)88 268 25 00 >>> F: +31(0)88 268 25 01 >>> M: +31(0)6 494 171 36 >>> >>> Delftechpark 37i | 2628 XJ Delft | The Netherlands >>> www.hinttech.com >>> >>> HintTech is a specialist in eBusiness Technology ( .Net, Java platform, >>> Tridion ) and IT-Projects. >>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. >>> NL8062.16.396.B01 >>> >>> ******************************************** >>> Hippocms-dev: Hippo CMS development public mailinglist >>> >>> Searchable archives can be found at: >>> MarkMail: http://hippocms-dev.markmail.org >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>> >>> >>> >> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> > > -- > > Reinier van den Born > HintTech B.V. > > T: +31(0)88 268 25 00 > F: +31(0)88 268 25 01 > M: +31(0)6 494 171 36 > > Delftechpark 37i | 2628 XJ Delft | The Netherlands > www.hinttech.com > > HintTech is a specialist in eBusiness Technology ( .Net, Java platform, > Tridion ) and IT-Projects. > Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. NL8062.16.396.B01 > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
