[htdig-dev] Status of defaults.xml

2002-10-16 Thread Brian White


Well, it is close to ready  - I now have it successfully generating
   * htcommon/defaults.cc
   * htdocs/cf_byprog.html
   * htdocs/cf_buname.html
   * 95% of htdocs/attrs.html

I still need to bundle up the changes - I was thinking of creating
a patch based on 3.2.0b4 and just posting that here.

At this stage, however, I have a particular question - what is
the status of defaults.cc? How much has to merged in? Will
there need to exist in parallel in the CVS for a peiod?

I have some code that would help here ( bits of hacked together C and
Perl code ) - I just want to know whether I need to include it
in the bundle!

Regs

Brian


-
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML  XML
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]

Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste




---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev



Re: [htdig-dev] Status of defaults.xml

2002-10-16 Thread Geoff Hutchison


On Wednesday, October 16, 2002, at 02:27  AM, Brian White wrote:

   * 95% of htdocs/attrs.html

I guess I'm not clear on what 95% means. Does this refer to the markup 
that you mentioned before?

 I still need to bundle up the changes - I was thinking of creating
 a patch based on 3.2.0b4 and just posting that here.

Yes, that's probably a good idea.

 the status of defaults.cc? How much has to merged in? Will
 there need to exist in parallel in the CVS for a peiod?

They can't really exist in parallel in the CVS--your code, after all, 
generates defaults.cc. Certainly some amount of merging will be needed 
for a while, but I don' think that barrier will be too high. But 
certainly I think your patch will need to be checked fairly carefully 
for possible gotchas and then we'll probably need to merge in 
Lachlan's proposed fixes.

 I have some code that would help here ( bits of hacked together C and
 Perl code ) - I just want to know whether I need to include it
 in the bundle!

I'm assuming you mean code to help with the merging?

-Geoff



---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev



RE: [htdig-dev] Status of defaults.xml

2002-10-16 Thread Gabriele Bartolini

Well, it is close to ready  - I now have it successfully generating

Well, first and foremost, it is the first time I express my opinion regarding
this solution and I think it is really efficient and intelligent. Good on
ya, mate Brian! :-)

Having said this, and also taking aknowledgement that I don't know how the
XML file is structured, I want to raise the problem of 'translation' of
the attributes' descriptions, uses, etc., in different languages.

Any ideas?

Ciao and thanks,
-Gabriele



---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev



Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons for objecting to LGPL change)

2002-10-16 Thread Geoff Hutchison

I'm going to take two separate issues and separate them for the moment:
1) What changes are needed for a solid 3.2.0 release.
2) The mifluz merge (in a separate e-mail).

Please don't take any of my comments as overly critical or flaming. 
You're new to the project and attempting to take on some heavy 
lifting--so I'm trying to transfer some experience.

   experience the idea of beta versions is to fix bugs, new features
   and major code rework is avoided if possible.

This is certainly the traditional definition. In practice with ht://Dig 
development, this hasn't worked very well. Typically this happens 
because there simply hasn't been the manpower to tackle several large 
cleanups at the same time. In the 3.1 betas, people also came out of 
the woodwork to contribute their local changes.

We do not currently have anything resembling a traditional software 
development and engineering process. Largely this happens because there 
has never been a significant number of core developers who can 
concentrate signficant amounts of time on ht://Dig. (I'm an excellent 
case in point.)

At some time in the future, it would probably be good to move to a more 
traditional release scheme. It would also be good to have more 
component-level test suites. In the meantime (i.e. for getting 3.2.0 out 
the door with an appropriate level of stability), I suggest you 
temporarily accept a more flexible definition of beta release. The 
reality starts with the list I mentioned--we absolutely must do some 
code reworks or we'll be layering more duct tape over our problems. In 
particular, IMHO, we'll continue to have weird htsearch bugs until we 
toss the current parser system.

 My past experience in importing alot of new code like this is that it's
 always harder then it seems that there are lots of bugs.

I'm curious how much open-source development you've done. Remember that 
merging patches is quite typical for maintainers--Gilles and I do this 
quite often. In the case of ht://Dig, while development resources are at 
a premium, we have often ported and merged patches.

The typical beta process with ht://Dig has been quite flexible towards 
the beginning and as a release like 3.1.0 firms up, fewer patches would 
be accepted. In answer to the question about 3.2.0 firming up, 
remember the maxim about development resources at a premium. For 
example, I'd much rather switch to the new htsearch framework because 
it'll be easier to find bugs.

 a case can be made that not only would the code differ significantly
 with the previous 3.2betas, it also has a load of new features.

Take a look at the release notes for 3.1.0 betas and for previous 3.2.0 
betas. As I said, we've had to take a rather flexible interpretation of 
a beta release. We currently don't have development or alpha 
releases. They would be nice, but I also have to be realistic about the 
pace of development and the number of active developers. Spinning a 
release, no matter what it's called, is a fair amount of work.

 Part of it is a moral thing.  Sometimes when a release is floundering 
 and
 taking too long, it's better to draw a line and say we're going to fix
 these bugs and get it out the door.

True. But pretty much every one of the points I mentioned in the 
previous e-mail goes directly to a bug-fix question. (So does the mifluz 
merge, but that's a separate e-mail.)

 substantial that the release needs to be called 4.0 just to give it
 enough credit ;-).

Avi Rappaport has said much the same thing. But:
a) it's really an issue worthy of a vote on htdig-dev.
b) it's not something to worry about until the final release is close to 
finished.

-Geoff



---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev



Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons for objecting to LGPL change)

2002-10-16 Thread Geoff Hutchison


On Tuesday, October 15, 2002, at 01:37  PM, Neal Richter wrote:

   2.  The mifluz devel list is near death, and it doesn't look like 
 anyone
   is actually using mifluz, or furthering development.

Fine, but that simply does not mean that prior releases were not made 
with active users, developers or testing. There has been much more 
significant testing (on my part included) on the mifluz framework than 
the remainder of the ht://Dig codebase.

   Can you say that it has had as much as the average HtDig release?  
 HtDig
 is MUCH more active then mifluz has ever been.

In terms of testing by the developers, component-level testing suites 
and testing before releases--the answer is pretty much yes. Granted, the 
mifluz releases between 0.14 (currently in 3.2.0b4) and 0.23 have not 
necessarily received the same pounding as thousands of ht://Dig users. 
But the users who were active with mifluz poured gigabytes of data 
through it too.

Remember also that we *are* mifluz. Take a look at the copyright 
designations.

   4.  How certain are we that these changes are going to make 3.2beta5
   MORE stable than the current beta?

I'm certain. I put a lot of testing into the mifluz code and it's 
definitely more stable now than it was.

   5.  The current mifluz code merge has problems with constructors and
   destructors in a library (libhtdig) setting.  I would rather help

No offense, but your argument applies here. Why should libhtdig be a 
feature criteria for 3.2.0b4?

   6.  It has performance problems.

These seem like they're locking issues--it seems like the database is 
being locked and unlocked way too much. When we're indexing, it seems 
like the database should be locked in place as much as possible and then 
unlocked at the end.

 My experience with the current snapshots is very positive.  I've had few
 problems and the indexing it self is pretty solid, especially with the 
 new
 zlib WordDB compression.

Sorry to sound dubious, but speaking of large code merges, you haven't 
submitted patches for me to merge into 3.2.0b4 either. As of yet, I 
haven't tested your zlib WordDB compression or seen if it has 
performance problems relative to 3.2.0b4. Can I claim that your code has 
seen as much user-level testing as 3.2.0b4 snapshots?



I'm somewhat trying to play devil's advocate here. My gut feeling is 
that the mifluz merge should be aimed towards a 3.2.0b5 release and we 
*should* get 3.2.0b4 out the door as stable as possible in the 
near-term. But I'm pretty sure that merging in the new mifluz code is an 
overall win.

-Geoff



---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev



Re: [htdig-dev] Re: 3.2 Stability (was [htdig-members] reasons forobjecting to LGPL change)

2002-10-16 Thread Neal Richter



On Wed, 16 Oct 2002, Geoff Hutchison wrote:


 On Tuesday, October 15, 2002, at 01:37  PM, Neal Richter wrote:

2.  The mifluz devel list is near death, and it doesn't look like
  anyone
is actually using mifluz, or furthering development.

 Fine, but that simply does not mean that prior releases were not made
 with active users, developers or testing. There has been much more
 significant testing (on my part included) on the mifluz framework than
 the remainder of the ht://Dig codebase.

  I agree in theory.  In practice until the new code has been verified to
be acceptable after a successful merge it is suspect.

  We hope that it will fix all our problems.. it will be a while before we
confirm this.


5.  The current mifluz code merge has problems with constructors and
destructors in a library (libhtdig) setting.  I would rather help

 No offense, but your argument applies here. Why should libhtdig be a
 feature criteria for 3.2.0b4?

  I agree, it's not a criteria.  I will maintain a separate branch for
that.

  My experience with the current snapshots is very positive.  I've had few
  problems and the indexing it self is pretty solid, especially with the
  new
  zlib WordDB compression.

 Sorry to sound dubious, but speaking of large code merges, you haven't
 submitted patches for me to merge into 3.2.0b4 either. As of yet, I
 haven't tested your zlib WordDB compression or seen if it has
 performance problems relative to 3.2.0b4. Can I claim that your code has
 seen as much user-level testing as 3.2.0b4 snapshots?

  Heh. ;-)  I'll get you those ASAP.

  Zlib is extremely well tested and the changes are a few lines of
code.  Giving this as a work around to people who encounter the WordDB
compression bug is a good alternative to hoping that its fixed in a
merged-mifluz codebase.

 I'm somewhat trying to play devil's advocate here. My gut feeling is
 that the mifluz merge should be aimed towards a 3.2.0b5 release and we
 *should* get 3.2.0b4 out the door as stable as possible in the
 near-term. But I'm pretty sure that merging in the new mifluz code is an
 overall win.


  I agree in theory.  In practice I am motivated to suggest we scale
back what is absolutely necessary in order to get users a new release
faster.

  Gilles in particular has voiced frustration over the delay in 3.2
release.  And the waste of his time maintaining 3.1.x  I'd hate to
continue adding to the pile and further frustrate him.

  If we were a company and were risking the speedy completion of a
release by wanting to incorporate a huge chunk of third party code that
needs more work... we'd be in real danger of getting fired.

  I guess I see these things:

  1.  The 3.2 dev process is too open-ended at present
  2.  The 3.1.x users need a new release
  3.  The current 3.2beta4 code offers a significant release to users
  4.  We are in danger of being waist deep in feature-creep quicksand.

  If we delay the integration of mifluz and the larger items on your list,
we'll have a tractable set of things to do to get a decent release out
there.

  Basically I'm suggesting that for morale purposes alone we do this and
set a goal of pushing a 3.2 release out the door by December.

  Next, we make a list and divide it between smaller changes and larger
ones.  Smaller ones go into 3.3 (release in March?) and the rest into 4.0.
The development could be semi-parallel at this point.

  You may disagree with the numbers game here, but I think it would be
good for morale to establish a set of well-reasoned conservative milestones
and meet them in the sort-term.

  If we implement a strategy like this and six-months later we look back
and see that we've had 1-2 releases and are moving forward with integration of
large new features/code we'll feel much better vs still being in
feature-creep quicksand.

  Here's a proposal

  http://ai.rightnow.com/htdig/proposed_schedule.html

  Basically I included only things in 3.2 schedule that are necessary to
fix or work around known bugs.  Things like Quim's new search frame-work
and the excellent XML-config file feature are in 3.3.

  More open-ended things like mifluz merge and STL and Unicode are in 4.0
 4.1

  Also the Zlib-WordDB in 3.2 and More efficient WordDB inverted index are
straight forward and buys us time with the mifluz merge.

  Anyway.. I'm sure you're you won't agree on my thoughts on the
mifluz-merge and this is certainly a conservative viewpoint on it.  If we
make good progress on the mifluz-merge by the end of the year I'll
withdraw any further objections.

 Eh?

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485





---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm

RE: [htdig-dev] Status of defaults.xml

2002-10-16 Thread Brian White

At 23:25 16/10/2002, Gabriele Bartolini wrote:
 Well, it is close to ready  - I now have it successfully generating

Well, first and foremost, it is the first time I express my opinion regarding
this solution and I think it is really efficient and intelligent. Good on
ya, mate Brian! :-)

Having said this, and also taking aknowledgement that I don't know how the
XML file is structured, I want to raise the problem of 'translation' of
the attributes' descriptions, uses, etc., in different languages.

Any ideas?

Yes.

Let's start with the DTD as it stands:

!ELEMENT HtdigAttributes ( attribute+ ) 

!--  attribute:

   name : Variable Name
   type : Type of Variable
   programs : Whitespace separated list of programs/modules
   using this attribute
   block: Configuration block this can be used in ( optional )
   version  : Version that introduced the attribute
   category : Attribute category (to split documentation)

   --

!ELEMENT attribute( default, ( nodocs | (example+, description ) ) 
!ATTLIST attribute name CDATA #REQUIRED
  type string|integer|boolean) string
  programs CDATA #REQUIRED
  blockCDATA #IMPLIED
  version  CDATA #REQUIRED
  category CDATA #REQUIRED
  

!-- Default value of attribute - configmacro=true would indicate the
  value is actually a macro ( eg BIN_DIR )
   --
!ELEMENT default (#PCDATA) 
!ATTLIST default configmacro (true|false) false 

!-- Basically a flag that suppresses documentation --
!ELEMENT nodocs EMPTY

!-- An example value that goes into the documentation --
!ELEMENT example (#PCDATA) 

!ENTITY % paratext #PCDATA|em|strong|a|ref 
!ENTITY % text %paratext;|table|p|br|ol|ul|dl|codeblock 

!ELEMENT description (%paratext;) 

... + all the element for formatting the description

The first thing to do is then look at the items that might need
translation:

 * description
 * block
 * category
 * example

Analysis:
* description is the one that will always need it
* I think the values for block and category should be considered
  as 'keys' rather than the actual values - they should be translated
  by lookup table.
* examples will *sometimes* require translation

To this end, I would suggest changing the following

!ELEMENT attribute ( default, ( nodocs | (example+, description ) ) 

to

!ELEMENT attribute ( default, ( nodocs | (example*, docset+ ) ) 

!-- lang would be the id of the language using a standard identifier, or
 set to default for the default language --

!ELEMENT docset  ( example*, description ) 
!ATTLIST docset  lang CDATA #REQUIRED 

As an example:

attribute name=no_title_text
   type=string
   programs=htsearch
   version=3.1.0
   category=Presentation:Text 
  defaultfilename/default
  example!?/example
  docset lang=default 
 exampleNo Title Found/example
 descriptionThis specifies the text to use in search results when no
 title is found in the document itself. If it is set to
 filename, htsearch will use the name of the file itself,
 enclosed in square brackets (e.g. [index.html]).
 /description
  /docset
  docset lang=fr 
 exampleAucun titre retrouvé/example
 descriptionCeci spécifie le texte à utiliser dans les résultats
  d'une recherche lorsque aucun titre se trouve dedans le document.
  Si on le règle à  filename, htsearch se servira du nom du fichier
  lui-même, inclus entre crochets (p.ex. [index.html]).
 /description
  /docset

/attribute

( And no - I don't speak french. I got a friend to do the translation
   for me )

This would then put the capability into the XML file. I would need to
figure out how to do characters like é - possibly as eacute; . As to

   * how this might then be used to generate documentation
   * how the translated versions will be maintained

are different issues altogether!

Note that it isn't a big change, but I think we should leave
it for version 2 defaults.xml.




Ciao and thanks,
-Gabriele

-
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML  XML
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]

Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste




---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]

Re: [htdig-dev] Status of defaults.xml

2002-10-16 Thread Geoff Hutchison


On Wednesday, October 16, 2002, at 07:41  PM, Brian White wrote:

 I can use that tool to take a merged version of defaults.cc to produce
 a version of defaults.xml. The problem is that a few of the descriptions
 will need to be reworked quite heavily by hand to produce valid XML.

OK, that makes sense of course. I had forgotten that you wrote a tool to 
generate the defaults.xml file. I would guess with some care, we can 
separate the new entries and only rework them if needed.

-Geoff



---
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
___
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev