[htdig-dev] Status of defaults.xml
Well, it is close to ready - I now have it successfully generating * htcommon/defaults.cc * htdocs/cf_byprog.html * htdocs/cf_buname.html * 95% of htdocs/attrs.html I still need to bundle up the changes - I was thinking of creating a patch based on 3.2.0b4 and just posting that here. At this stage, however, I have a particular question - what is the status of defaults.cc? How much has to merged in? Will there need to exist in parallel in the CVS for a peiod? I have some code that would help here ( bits of hacked together C and Perl code ) - I just want to know whether I need to include it in the bundle! Regs Brian - Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: [EMAIL PROTECTED] Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] Status of defaults.xml
On Wednesday, October 16, 2002, at 02:27 AM, Brian White wrote: * 95% of htdocs/attrs.html I guess I'm not clear on what 95% means. Does this refer to the markup that you mentioned before? I still need to bundle up the changes - I was thinking of creating a patch based on 3.2.0b4 and just posting that here. Yes, that's probably a good idea. the status of defaults.cc? How much has to merged in? Will there need to exist in parallel in the CVS for a peiod? They can't really exist in parallel in the CVS--your code, after all, generates defaults.cc. Certainly some amount of merging will be needed for a while, but I don' think that barrier will be too high. But certainly I think your patch will need to be checked fairly carefully for possible gotchas and then we'll probably need to merge in Lachlan's proposed fixes. I have some code that would help here ( bits of hacked together C and Perl code ) - I just want to know whether I need to include it in the bundle! I'm assuming you mean code to help with the merging? -Geoff --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
RE: [htdig-dev] Status of defaults.xml
Well, it is close to ready - I now have it successfully generating Well, first and foremost, it is the first time I express my opinion regarding this solution and I think it is really efficient and intelligent. Good on ya, mate Brian! :-) Having said this, and also taking aknowledgement that I don't know how the XML file is structured, I want to raise the problem of 'translation' of the attributes' descriptions, uses, etc., in different languages. Any ideas? Ciao and thanks, -Gabriele --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons for objecting to LGPL change)
I'm going to take two separate issues and separate them for the moment: 1) What changes are needed for a solid 3.2.0 release. 2) The mifluz merge (in a separate e-mail). Please don't take any of my comments as overly critical or flaming. You're new to the project and attempting to take on some heavy lifting--so I'm trying to transfer some experience. experience the idea of beta versions is to fix bugs, new features and major code rework is avoided if possible. This is certainly the traditional definition. In practice with ht://Dig development, this hasn't worked very well. Typically this happens because there simply hasn't been the manpower to tackle several large cleanups at the same time. In the 3.1 betas, people also came out of the woodwork to contribute their local changes. We do not currently have anything resembling a traditional software development and engineering process. Largely this happens because there has never been a significant number of core developers who can concentrate signficant amounts of time on ht://Dig. (I'm an excellent case in point.) At some time in the future, it would probably be good to move to a more traditional release scheme. It would also be good to have more component-level test suites. In the meantime (i.e. for getting 3.2.0 out the door with an appropriate level of stability), I suggest you temporarily accept a more flexible definition of beta release. The reality starts with the list I mentioned--we absolutely must do some code reworks or we'll be layering more duct tape over our problems. In particular, IMHO, we'll continue to have weird htsearch bugs until we toss the current parser system. My past experience in importing alot of new code like this is that it's always harder then it seems that there are lots of bugs. I'm curious how much open-source development you've done. Remember that merging patches is quite typical for maintainers--Gilles and I do this quite often. In the case of ht://Dig, while development resources are at a premium, we have often ported and merged patches. The typical beta process with ht://Dig has been quite flexible towards the beginning and as a release like 3.1.0 firms up, fewer patches would be accepted. In answer to the question about 3.2.0 firming up, remember the maxim about development resources at a premium. For example, I'd much rather switch to the new htsearch framework because it'll be easier to find bugs. a case can be made that not only would the code differ significantly with the previous 3.2betas, it also has a load of new features. Take a look at the release notes for 3.1.0 betas and for previous 3.2.0 betas. As I said, we've had to take a rather flexible interpretation of a beta release. We currently don't have development or alpha releases. They would be nice, but I also have to be realistic about the pace of development and the number of active developers. Spinning a release, no matter what it's called, is a fair amount of work. Part of it is a moral thing. Sometimes when a release is floundering and taking too long, it's better to draw a line and say we're going to fix these bugs and get it out the door. True. But pretty much every one of the points I mentioned in the previous e-mail goes directly to a bug-fix question. (So does the mifluz merge, but that's a separate e-mail.) substantial that the release needs to be called 4.0 just to give it enough credit ;-). Avi Rappaport has said much the same thing. But: a) it's really an issue worthy of a vote on htdig-dev. b) it's not something to worry about until the final release is close to finished. -Geoff --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons for objecting to LGPL change)
On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: 2. The mifluz devel list is near death, and it doesn't look like anyone is actually using mifluz, or furthering development. Fine, but that simply does not mean that prior releases were not made with active users, developers or testing. There has been much more significant testing (on my part included) on the mifluz framework than the remainder of the ht://Dig codebase. Can you say that it has had as much as the average HtDig release? HtDig is MUCH more active then mifluz has ever been. In terms of testing by the developers, component-level testing suites and testing before releases--the answer is pretty much yes. Granted, the mifluz releases between 0.14 (currently in 3.2.0b4) and 0.23 have not necessarily received the same pounding as thousands of ht://Dig users. But the users who were active with mifluz poured gigabytes of data through it too. Remember also that we *are* mifluz. Take a look at the copyright designations. 4. How certain are we that these changes are going to make 3.2beta5 MORE stable than the current beta? I'm certain. I put a lot of testing into the mifluz code and it's definitely more stable now than it was. 5. The current mifluz code merge has problems with constructors and destructors in a library (libhtdig) setting. I would rather help No offense, but your argument applies here. Why should libhtdig be a feature criteria for 3.2.0b4? 6. It has performance problems. These seem like they're locking issues--it seems like the database is being locked and unlocked way too much. When we're indexing, it seems like the database should be locked in place as much as possible and then unlocked at the end. My experience with the current snapshots is very positive. I've had few problems and the indexing it self is pretty solid, especially with the new zlib WordDB compression. Sorry to sound dubious, but speaking of large code merges, you haven't submitted patches for me to merge into 3.2.0b4 either. As of yet, I haven't tested your zlib WordDB compression or seen if it has performance problems relative to 3.2.0b4. Can I claim that your code has seen as much user-level testing as 3.2.0b4 snapshots? I'm somewhat trying to play devil's advocate here. My gut feeling is that the mifluz merge should be aimed towards a 3.2.0b5 release and we *should* get 3.2.0b4 out the door as stable as possible in the near-term. But I'm pretty sure that merging in the new mifluz code is an overall win. -Geoff --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons forobjecting to LGPL change)
On Wed, 16 Oct 2002, Geoff Hutchison wrote: On Tuesday, October 15, 2002, at 01:37 PM, Neal Richter wrote: 2. The mifluz devel list is near death, and it doesn't look like anyone is actually using mifluz, or furthering development. Fine, but that simply does not mean that prior releases were not made with active users, developers or testing. There has been much more significant testing (on my part included) on the mifluz framework than the remainder of the ht://Dig codebase. I agree in theory. In practice until the new code has been verified to be acceptable after a successful merge it is suspect. We hope that it will fix all our problems.. it will be a while before we confirm this. 5. The current mifluz code merge has problems with constructors and destructors in a library (libhtdig) setting. I would rather help No offense, but your argument applies here. Why should libhtdig be a feature criteria for 3.2.0b4? I agree, it's not a criteria. I will maintain a separate branch for that. My experience with the current snapshots is very positive. I've had few problems and the indexing it self is pretty solid, especially with the new zlib WordDB compression. Sorry to sound dubious, but speaking of large code merges, you haven't submitted patches for me to merge into 3.2.0b4 either. As of yet, I haven't tested your zlib WordDB compression or seen if it has performance problems relative to 3.2.0b4. Can I claim that your code has seen as much user-level testing as 3.2.0b4 snapshots? Heh. ;-) I'll get you those ASAP. Zlib is extremely well tested and the changes are a few lines of code. Giving this as a work around to people who encounter the WordDB compression bug is a good alternative to hoping that its fixed in a merged-mifluz codebase. I'm somewhat trying to play devil's advocate here. My gut feeling is that the mifluz merge should be aimed towards a 3.2.0b5 release and we *should* get 3.2.0b4 out the door as stable as possible in the near-term. But I'm pretty sure that merging in the new mifluz code is an overall win. I agree in theory. In practice I am motivated to suggest we scale back what is absolutely necessary in order to get users a new release faster. Gilles in particular has voiced frustration over the delay in 3.2 release. And the waste of his time maintaining 3.1.x I'd hate to continue adding to the pile and further frustrate him. If we were a company and were risking the speedy completion of a release by wanting to incorporate a huge chunk of third party code that needs more work... we'd be in real danger of getting fired. I guess I see these things: 1. The 3.2 dev process is too open-ended at present 2. The 3.1.x users need a new release 3. The current 3.2beta4 code offers a significant release to users 4. We are in danger of being waist deep in feature-creep quicksand. If we delay the integration of mifluz and the larger items on your list, we'll have a tractable set of things to do to get a decent release out there. Basically I'm suggesting that for morale purposes alone we do this and set a goal of pushing a 3.2 release out the door by December. Next, we make a list and divide it between smaller changes and larger ones. Smaller ones go into 3.3 (release in March?) and the rest into 4.0. The development could be semi-parallel at this point. You may disagree with the numbers game here, but I think it would be good for morale to establish a set of well-reasoned conservative milestones and meet them in the sort-term. If we implement a strategy like this and six-months later we look back and see that we've had 1-2 releases and are moving forward with integration of large new features/code we'll feel much better vs still being in feature-creep quicksand. Here's a proposal http://ai.rightnow.com/htdig/proposed_schedule.html Basically I included only things in 3.2 schedule that are necessary to fix or work around known bugs. Things like Quim's new search frame-work and the excellent XML-config file feature are in 3.3. More open-ended things like mifluz merge and STL and Unicode are in 4.0 4.1 Also the Zlib-WordDB in 3.2 and More efficient WordDB inverted index are straight forward and buys us time with the mifluz merge. Anyway.. I'm sure you're you won't agree on my thoughts on the mifluz-merge and this is certainly a conservative viewpoint on it. If we make good progress on the mifluz-merge by the end of the year I'll withdraw any further objections. Eh? Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm
RE: [htdig-dev] Status of defaults.xml
At 23:25 16/10/2002, Gabriele Bartolini wrote: Well, it is close to ready - I now have it successfully generating Well, first and foremost, it is the first time I express my opinion regarding this solution and I think it is really efficient and intelligent. Good on ya, mate Brian! :-) Having said this, and also taking aknowledgement that I don't know how the XML file is structured, I want to raise the problem of 'translation' of the attributes' descriptions, uses, etc., in different languages. Any ideas? Yes. Let's start with the DTD as it stands: !ELEMENT HtdigAttributes ( attribute+ ) !-- attribute: name : Variable Name type : Type of Variable programs : Whitespace separated list of programs/modules using this attribute block: Configuration block this can be used in ( optional ) version : Version that introduced the attribute category : Attribute category (to split documentation) -- !ELEMENT attribute( default, ( nodocs | (example+, description ) ) !ATTLIST attribute name CDATA #REQUIRED type string|integer|boolean) string programs CDATA #REQUIRED blockCDATA #IMPLIED version CDATA #REQUIRED category CDATA #REQUIRED !-- Default value of attribute - configmacro=true would indicate the value is actually a macro ( eg BIN_DIR ) -- !ELEMENT default (#PCDATA) !ATTLIST default configmacro (true|false) false !-- Basically a flag that suppresses documentation -- !ELEMENT nodocs EMPTY !-- An example value that goes into the documentation -- !ELEMENT example (#PCDATA) !ENTITY % paratext #PCDATA|em|strong|a|ref !ENTITY % text %paratext;|table|p|br|ol|ul|dl|codeblock !ELEMENT description (%paratext;) ... + all the element for formatting the description The first thing to do is then look at the items that might need translation: * description * block * category * example Analysis: * description is the one that will always need it * I think the values for block and category should be considered as 'keys' rather than the actual values - they should be translated by lookup table. * examples will *sometimes* require translation To this end, I would suggest changing the following !ELEMENT attribute ( default, ( nodocs | (example+, description ) ) to !ELEMENT attribute ( default, ( nodocs | (example*, docset+ ) ) !-- lang would be the id of the language using a standard identifier, or set to default for the default language -- !ELEMENT docset ( example*, description ) !ATTLIST docset lang CDATA #REQUIRED As an example: attribute name=no_title_text type=string programs=htsearch version=3.1.0 category=Presentation:Text defaultfilename/default example!?/example docset lang=default exampleNo Title Found/example descriptionThis specifies the text to use in search results when no title is found in the document itself. If it is set to filename, htsearch will use the name of the file itself, enclosed in square brackets (e.g. [index.html]). /description /docset docset lang=fr exampleAucun titre retrouvé/example descriptionCeci spécifie le texte à utiliser dans les résultats d'une recherche lorsque aucun titre se trouve dedans le document. Si on le règle à filename, htsearch se servira du nom du fichier lui-même, inclus entre crochets (p.ex. [index.html]). /description /docset /attribute ( And no - I don't speak french. I got a friend to do the translation for me ) This would then put the capability into the XML file. I would need to figure out how to do characters like é - possibly as eacute; . As to * how this might then be used to generate documentation * how the translated versions will be maintained are different issues altogether! Note that it isn't a big change, but I think we should leave it for version 2 defaults.xml. Ciao and thanks, -Gabriele - Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: [EMAIL PROTECTED] Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED]
Re: [htdig-dev] Status of defaults.xml
On Wednesday, October 16, 2002, at 07:41 PM, Brian White wrote: I can use that tool to take a merged version of defaults.cc to produce a version of defaults.xml. The problem is that a few of the descriptions will need to be reworked quite heavily by hand to produce valid XML. OK, that makes sense of course. I had forgotten that you wrote a tool to generate the defaults.xml file. I would guess with some care, we can separate the new entries and only rework them if needed. -Geoff --- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm ___ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev