Hi Ard,
Not deleting is how I started.
This gave me the same problem, or at least one that looked similar.
I hoped to solve that by trying to delete first.
I changed my code back and now it seems to be working fine...
My conclusion:
The initial problem was caused by a bad index, as Jasha suggested.
I fixed that when I was already trapped in the deletion problem.
Anyway, I am happy now :-)
Thanks everybody!!
Have a good weekend,
Reinier
Ard Schrijvers wrote:
On Fri, Jun 12, 2009 at 3:16 PM, Reinier van den
Born<[email protected]> wrote:
Hi Ard,
To make sure I understand this correctly. I should do:
delete; wait; write.
Currently the cycle time is 5 seconds, which would make it a very slow
process.
Alternatively I could delete all first and then write,
that is what i meant
but that would mean that all content would be gone for 5 seconds.
Yes, true
Makes it worthwhile to find a way to write only documents that have been
modified.
As i said, the problem does not occur when you just change an existing
document: why delete it and add it again? Just override its contents,
it is faster, and does not have the issue.
That's what I did in the beginning. It does have the issue.
So how does a tool like Dav2Disk handle this?
I don't know the tool. AFAIK, a tool like that is meant for initial
importing, not primarily meant for an in production repository
That is certainly not waiting 5 seconds for each file to write.
Nor is it deleting everything first before it writes, or?
Or does it suffer from the same problem and I just never noticed it?
As explained i don't know the tool. But, here my suggestion:
You shouldn't delete and add documents that haven't been changed: it
doesn't make sense
Howto avoid:
1) compute simple md5 or some other hash of the documents text before
putting it in the repository
2) store the md5 as a property
3) before deleting / adding a document, compute md5 and check if it
exists in the repository (simple search)
4) modify changed documents instead of delete/add cycle
I am confident this does solve your issue. You can test it first if
you want with the 5 sec delay to be sure
Regards Ard
Reinier
Ard Schrijvers wrote:
Hello Reinier,
On Fri, Jun 12, 2009 at 1:59 PM, Reinier van den
Born<[email protected]> wrote:
Bart,
The version of the repository is 1.2.15.1.
Btw. I tried deleting before writing. It doesn't make a difference.
This is a known issue, not easy to solve. You have two possible solutions:
1) instead of a deletion / add cycle you modify an existing document
2) to the deletion of the old ones in a seperate cycle, with at least
a delay of X seconds, where X is the value in your cron configuraiton
of the indexer.xml
I hope this isn't to much of a problem for you. At least, you can
check whether my proposed solution works
Regards Ard
Reinier
Bart van der Schans wrote:
Reinier,
Which version of the repository are you using?
Bart
On Fri, Jun 12, 2009 at 1:14 PM, Reinier van den Born
<[email protected]> wrote:
Hi Jasha,
Rebuilding the index fixed the problem of results not showing up.
Problem remains that if content is written twice it shows up twice.
Maybe I should delete the existing document before I write it?
(at the moment I simply overwrite...)
Reinier
Jasha Joachimsthal wrote:
Hi Reinier,
this looks like your Lucene index contains some errors if some results
appear twice and others don't appear at all. Try rebuilding the index.
Jasha Joachimsthal
[email protected] - [email protected]
www.onehippo.com
Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA
94952 +1 (707) 7734646
2009/6/11 Reinier van den Born <[email protected]>:
Hello,
I try to automatically update a collection of documents in a Hippo
repository.
Each document is kept in its own collection within a "main"
collection:
../1/a.xml, ../2/b.xml, etc.
Each update is independent of earlier ones: I don't need caching, no
JMS,
or
what more.
So I do a simple scan for old documents (fetchCollection), upload the
new
and delete the old.
Very simple, so I was thinking I could use the Java Adapter
directly...
Which works except for the getting the scan. Its function is similar
to
"ls
.../*/*.xml".
But my code+DASL gives me a weird response:
- only documents show up that have recently be touched by the CMS
(clicked
on, not necessarily opened)
- the documents I write appear repeated in the list (=duplicates,
each
write
cycle one occurrence is added)
- this duplication is reset when I change the DASL query (eg depth to
1,
returns no documents, and back to 2).
- all documents are listed correctly by CMS and DAVexplorer, no
problemo.
I use my own plain WebdavServiceImpl, which I assume does no caching.
Also when I restart my app (tomcat) nothing changes, nor when I
restart
the
repo.
Anyway, any help is appreciated? See code below.
Thanks,
Reinier
------------------
Here the code I use:
.....
public void hippoInit (Properties props) {
try {
WebdavConfig webdavConfig = new WebdavConfig(props);
webdavService = new WebdavServiceImpl(webdavConfig);
rootPath = webdavService.getBasePath();
}
catch (Exception e) {
error( "Error initializing Hippo repository connection:
"+e.getMessage());
}
}
public HashMap hippoScanJobOpenings (String relPath) {
HashMap jobs = new HashMap();
jobs.put( "REPO.RELPATH", relPath );
String query = Interpolation.interpolate( jobsQuery, jobs );
try {
DocumentCollection coll = webdavService.fetchCollection(
rootPath,
query, false );
List docs = coll.getDocuments();
Iterator iter = docs.iterator();
while (iter.hasNext()) {
Document collDoc = (Document) iter.next();
String dirPath = ((DocumentPath)
collDoc.getPath()).getRelativePath();
message( "Found job: "+dirPath );
}
}
catch (Exception e) {
error( "Error getting existing job openings: "+e.getMessage());
}
return jobs;
}
The DASL query used is:
<d:searchrequest xmlns:d="DAV:"
xmlns:S="http://jakarta.apache.org/slide/"
xmlns:h="http://hippo.nl/cms/1.0">
<d:basicsearch>
<d:select>
<d:prop>
<h:caption/>
<d:displayname/>
<h:type/>
<d:modificationdate/>
</d:prop>
</d:select>
<d:from>
<d:scope>
<d:href>${REPO.RELPATH}</d:href>
<d:depth>2</d:depth>
</d:scope>
</d:from>
<d:where>
<d:eq>
<d:prop><h:type/></d:prop>
<d:literal>jobopening</d:literal>
</d:eq>
</d:where>
<d:orderby>
<d:order>
<d:prop><h:modificationDate/></d:prop>
<d:ascending/>
</d:order>
</d:orderby>
</d:basicsearch>
</d:searchrequest>
Notes:
- props contains the settings to initialise the WebdavConfig object
as
described in ...
- relPath is the path from rootPath to the collection containing the
documents.
--
Reinier van den Born
HintTech B.V.
T: +31(0)88 268 25 00
F: +31(0)88 268 25 01
M: +31(0)6 494 171 36
Delftechpark 37i | 2628 XJ Delft | The Netherlands
www.hinttech.com
HintTech is a specialist in eBusiness Technology ( .Net, Java
platform,
Tridion ) and IT-Projects.
Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
NL8062.16.396.B01
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
Reinier van den Born
HintTech B.V.
T: +31(0)88 268 25 00
F: +31(0)88 268 25 01
M: +31(0)6 494 171 36
Delftechpark 37i | 2628 XJ Delft | The Netherlands
www.hinttech.com
HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
Tridion ) and IT-Projects.
Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
NL8062.16.396.B01
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
Reinier van den Born
HintTech B.V.
T: +31(0)88 268 25 00
F: +31(0)88 268 25 01
M: +31(0)6 494 171 36
Delftechpark 37i | 2628 XJ Delft | The Netherlands
www.hinttech.com
HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
Tridion ) and IT-Projects.
Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
NL8062.16.396.B01
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
Reinier van den Born
HintTech B.V.
T: +31(0)88 268 25 00
F: +31(0)88 268 25 01
M: +31(0)6 494 171 36
Delftechpark 37i | 2628 XJ Delft | The Netherlands
www.hinttech.com
HintTech is a specialist in eBusiness Technology ( .Net, Java platform,
Tridion ) and IT-Projects.
Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. NL8062.16.396.B01
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
Reinier van den Born
HintTech B.V.
T: +31(0)88 268 25 00
F: +31(0)88 268 25 01
M: +31(0)6 494 171 36
Delftechpark 37i | 2628 XJ Delft | The Netherlands
www.hinttech.com
HintTech is a specialist in eBusiness Technology ( .Net, Java platform, Tridion
) and IT-Projects.
Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr. NL8062.16.396.B01
begin:vcard
fn:Reinier van den Born
n:van den Born;Reinier
org:HintTech B.V.
adr:;;Delfttechpark 37i ;Delft;;2628 XJ;Netherlands
email;internet:[email protected]
tel;work:+31-88-268 25 00
tel;fax:+31-88-268 25 01
tel;cell:+31-6 494 171 36
note;quoted-printable:KvK Den Haag nr. 27242282 | BTW nr. NL8062.16.396.B01=0D=0A=
=0D=0A=
HintTech levert specialisten op het gebied van softwareontwikkeling (.NET=
en Java), projectmanagement, informatiebeveiliging en business consulting=
.=0D=0A=
=0D=0A=
url:www.hinttech.com
version:2.1
end:vcard
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html