On Fri, Jun 12, 2009 at 3:16 PM, Reinier van den
Born<[email protected]> wrote:
> Hi Ard,
>
> To make sure I understand this correctly. I should do:
>
>  delete; wait; write.
>
> Currently the cycle time is 5 seconds, which would make it a very slow
> process.
> Alternatively I could delete all first and then write,

that is what i meant

> but that would mean that all content would be gone for 5 seconds.

Yes, true

> Makes it worthwhile to find a way to write only documents that have been
> modified.

As i said, the problem does not occur when you just change an existing
document: why delete it and add it again? Just override its contents,
it is faster, and does not have the issue.

>
> So how does a tool like Dav2Disk handle this?

I don't know the tool. AFAIK, a tool like that is meant for initial
importing, not primarily meant for an in production repository

> That is certainly not waiting 5 seconds for each file to write.
> Nor is it deleting everything first before it writes, or?
> Or does it suffer from the same problem and I just never noticed it?

As explained i don't know the tool. But, here my suggestion:

You shouldn't delete and add documents that haven't been changed: it
doesn't make sense

Howto avoid:

1) compute simple md5 or some other hash of the documents text before
putting it in the repository
2) store the md5 as a property
3) before deleting / adding a document, compute md5 and check if it
exists in the repository (simple search)
4) modify changed documents instead of delete/add cycle

I am confident this does solve your issue. You can test it first if
you want with the 5 sec delay to be sure

Regards Ard

>
> Reinier
>
> Ard Schrijvers wrote:
>>
>> Hello Reinier,
>>
>> On Fri, Jun 12, 2009 at 1:59 PM, Reinier van den
>> Born<[email protected]> wrote:
>>>
>>> Bart,
>>>
>>> The version of the repository is 1.2.15.1.
>>>
>>> Btw. I tried deleting before writing. It doesn't make a difference.
>>
>> This is a known issue, not easy to solve. You have two possible solutions:
>>
>> 1) instead of a deletion / add cycle you modify an existing document
>> 2) to the deletion of the old ones in a seperate cycle, with at least
>> a delay of X seconds, where X is the value in your cron configuraiton
>> of the indexer.xml
>>
>> I hope this isn't to much of a problem for you. At least, you can
>> check whether my proposed solution works
>>
>> Regards Ard
>>
>>>
>>> Reinier
>>>
>>>
>>> Bart van der Schans wrote:
>>>>
>>>> Reinier,
>>>>
>>>> Which version of the repository are you using?
>>>>
>>>> Bart
>>>>
>>>> On Fri, Jun 12, 2009 at 1:14 PM, Reinier van den Born
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi Jasha,
>>>>>
>>>>> Rebuilding the index fixed the problem of results not showing up.
>>>>> Problem remains that if content is written twice it shows up twice.
>>>>>
>>>>> Maybe I should delete the existing document before I write it?
>>>>> (at the moment I simply overwrite...)
>>>>>
>>>>> Reinier
>>>>>
>>>>>
>>>>> Jasha Joachimsthal wrote:
>>>>>>
>>>>>> Hi Reinier,
>>>>>>
>>>>>> this looks like your Lucene index contains some errors if some results
>>>>>> appear twice and others don't appear at all. Try rebuilding the index.
>>>>>>
>>>>>> Jasha Joachimsthal
>>>>>>
>>>>>> [email protected] - [email protected]
>>>>>>
>>>>>> www.onehippo.com
>>>>>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
>>>>>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA
>>>>>> 94952 +1 (707) 7734646
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/6/11 Reinier van den Born <[email protected]>:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I try to automatically update a collection of documents in a Hippo
>>>>>>> repository.
>>>>>>> Each document is kept in its own collection within a "main"
>>>>>>> collection:
>>>>>>> ../1/a.xml, ../2/b.xml, etc.
>>>>>>> Each update is independent of earlier ones: I don't need caching, no
>>>>>>> JMS,
>>>>>>> or
>>>>>>> what more.
>>>>>>>
>>>>>>> So I do a simple scan for old documents (fetchCollection), upload the
>>>>>>> new
>>>>>>> and delete the old.
>>>>>>> Very simple, so I was thinking I could use the Java Adapter
>>>>>>> directly...
>>>>>>>
>>>>>>> Which works except for the getting the scan. Its function is similar
>>>>>>> to
>>>>>>> "ls
>>>>>>> .../*/*.xml".
>>>>>>> But my code+DASL gives me a weird response:
>>>>>>> - only documents show up that have recently be touched by the CMS
>>>>>>> (clicked
>>>>>>> on, not necessarily opened)
>>>>>>> - the documents I write appear repeated in the list (=duplicates,
>>>>>>> each
>>>>>>> write
>>>>>>> cycle one occurrence is added)
>>>>>>> - this duplication is reset when I change the DASL query (eg depth to
>>>>>>> 1,
>>>>>>> returns no documents, and back to 2).
>>>>>>> - all documents are listed correctly by CMS and DAVexplorer, no
>>>>>>> problemo.
>>>>>>>
>>>>>>> I use my own plain WebdavServiceImpl, which I assume does no caching.
>>>>>>> Also when I restart my app (tomcat) nothing changes, nor when I
>>>>>>> restart
>>>>>>> the
>>>>>>> repo.
>>>>>>>
>>>>>>> Anyway, any help is appreciated? See code below.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Reinier
>>>>>>>
>>>>>>>
>>>>>>> ------------------
>>>>>>> Here the code I use:
>>>>>>>
>>>>>>> .....
>>>>>>> public void hippoInit (Properties props) {
>>>>>>>  try {
>>>>>>>     WebdavConfig webdavConfig = new WebdavConfig(props);
>>>>>>>     webdavService = new WebdavServiceImpl(webdavConfig);
>>>>>>>     rootPath      = webdavService.getBasePath();
>>>>>>>  }
>>>>>>>  catch (Exception e) {
>>>>>>>     error( "Error initializing Hippo repository connection:
>>>>>>> "+e.getMessage());
>>>>>>>  }
>>>>>>> }
>>>>>>>
>>>>>>> public HashMap hippoScanJobOpenings (String relPath) {
>>>>>>>  HashMap jobs = new HashMap();
>>>>>>>  jobs.put( "REPO.RELPATH", relPath );
>>>>>>>
>>>>>>>  String query = Interpolation.interpolate( jobsQuery, jobs );
>>>>>>>  try {
>>>>>>>     DocumentCollection coll = webdavService.fetchCollection(
>>>>>>> rootPath,
>>>>>>> query, false );
>>>>>>>     List docs = coll.getDocuments();
>>>>>>>
>>>>>>>     Iterator iter = docs.iterator();
>>>>>>>     while (iter.hasNext()) {
>>>>>>>         Document collDoc = (Document) iter.next();
>>>>>>>         String   dirPath = ((DocumentPath)
>>>>>>> collDoc.getPath()).getRelativePath();
>>>>>>>         message( "Found job: "+dirPath );
>>>>>>>     }
>>>>>>>  }
>>>>>>>  catch (Exception e) {
>>>>>>>     error( "Error getting existing job openings: "+e.getMessage());
>>>>>>>  }
>>>>>>>  return jobs;
>>>>>>> }
>>>>>>>
>>>>>>> The DASL query used is:
>>>>>>>
>>>>>>> <d:searchrequest xmlns:d="DAV:"
>>>>>>> xmlns:S="http://jakarta.apache.org/slide/";
>>>>>>> xmlns:h="http://hippo.nl/cms/1.0";>
>>>>>>>  <d:basicsearch>
>>>>>>>  <d:select>
>>>>>>>   <d:prop>
>>>>>>>     <h:caption/>
>>>>>>>     <d:displayname/>
>>>>>>>     <h:type/>
>>>>>>>     <d:modificationdate/>
>>>>>>>   </d:prop>
>>>>>>>  </d:select>
>>>>>>>  <d:from>
>>>>>>>   <d:scope>
>>>>>>>     <d:href>${REPO.RELPATH}</d:href>
>>>>>>>     <d:depth>2</d:depth>
>>>>>>>   </d:scope>
>>>>>>>  </d:from>
>>>>>>>  <d:where>
>>>>>>>   <d:eq>
>>>>>>>     <d:prop><h:type/></d:prop>
>>>>>>>     <d:literal>jobopening</d:literal>
>>>>>>>   </d:eq>
>>>>>>>  </d:where>
>>>>>>>  <d:orderby>
>>>>>>>   <d:order>
>>>>>>>     <d:prop><h:modificationDate/></d:prop>
>>>>>>>     <d:ascending/>
>>>>>>>   </d:order>
>>>>>>>  </d:orderby>
>>>>>>>  </d:basicsearch>
>>>>>>> </d:searchrequest>
>>>>>>>
>>>>>>> Notes:
>>>>>>>
>>>>>>> - props contains the settings to initialise the WebdavConfig object
>>>>>>> as
>>>>>>> described in ...
>>>>>>> - relPath is the path from rootPath to the collection containing the
>>>>>>> documents.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Reinier van den Born
>>>>>>> HintTech B.V.
>>>>>>>
>>>>>>> T: +31(0)88 268 25 00
>>>>>>> F: +31(0)88 268 25 01
>>>>>>> M: +31(0)6 494 171 36
>>>>>>>
>>>>>>> Delftechpark 37i | 2628 XJ Delft | The Netherlands
>>>>>>> www.hinttech.com
>>>>>>>
>>>>>>> HintTech is a specialist in eBusiness Technology ( .Net, Java
>>>>>>> platform,
>>>>>>> Tridion ) and IT-Projects.
>>>>>>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
>>>>>>> NL8062.16.396.B01
>>>>>>>
>>>>>>>
>>>>>>> ********************************************
>>>>>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>>>>>
>>>>>>> Searchable archives can be found at:
>>>>>>> MarkMail: http://hippocms-dev.markmail.org
>>>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ********************************************
>>>>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>>>>
>>>>>> Searchable archives can be found at:
>>>>>> MarkMail: http://hippocms-dev.markmail.org
>>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>>>>
>>>>> --
>>>>>
>>>>> Reinier van den Born
>>>>> HintTech B.V.
>>>>>
>>>>> T: +31(0)88 268 25 00
>>>>> F: +31(0)88 268 25 01
>>>>> M: +31(0)6 494 171 36
>>>>>
>>>>> Delftechpark 37i | 2628 XJ Delft | The Netherlands
>>>>> www.hinttech.com
>>>>>
>>>>> HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
>>>>> Tridion ) and IT-Projects.
>>>>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
>>>>> NL8062.16.396.B01
>>>>>
>>>>> ********************************************
>>>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>>>
>>>>> Searchable archives can be found at:
>>>>> MarkMail: http://hippocms-dev.markmail.org
>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>>
>>> Reinier van den Born
>>> HintTech B.V.
>>>
>>> T: +31(0)88 268 25 00
>>> F: +31(0)88 268 25 01
>>> M: +31(0)6 494 171 36
>>>
>>> Delftechpark 37i | 2628 XJ Delft | The Netherlands
>>> www.hinttech.com
>>>
>>> HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
>>> Tridion ) and IT-Projects.
>>> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
>>> NL8062.16.396.B01
>>>
>>> ********************************************
>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>
>>> Searchable archives can be found at:
>>> MarkMail: http://hippocms-dev.markmail.org
>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>
>>>
>>>
>> ********************************************
>> Hippocms-dev: Hippo CMS development public mailinglist
>>
>> Searchable archives can be found at:
>> MarkMail: http://hippocms-dev.markmail.org
>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>
>
> --
>
> Reinier van den Born
> HintTech B.V.
>
> T: +31(0)88 268 25 00
> F: +31(0)88 268 25 01
> M: +31(0)6 494 171 36
>
> Delftechpark 37i | 2628 XJ Delft | The Netherlands
> www.hinttech.com
>
> HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
> Tridion ) and IT-Projects.
> Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. NL8062.16.396.B01
>
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to