Hello,

I investigated an idea of including some W3C DTD's in nutch SVN to allow
building nutch without Internet connection.
Before preparing a formal patch I want to share my findings to make sure
the whole idea makes sense.

There are 4 files to include:

1. http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

(   Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),   All Rights
Reserved. ) -  so in my opinion it is ok to include it in SVN as it is
MIT/W3C license

2. http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
3. http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
4. http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent

All 3 files have following license:
     Portions (C) International Organization for Standardization 1986:
     Permission to copy in any form is granted for use with
     conforming SGML systems and applications as defined in
     ISO 8879, provided this notice is included in all copies.

As it is used by MIT/W3C copyrighted document I assume it is safe to
include it nutch SVN - am I correct?


So if all legal issues are resolved I would like to propose to add xmlcatalog directory in src - containing all 4 files. Than following modification would be required in build.xml

1) Definition of xmlcatalog:

 <xmlcatalog id="docDTDs">
        <dtd publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"                  
   
location="${basedir}/xmlcatalog/xhtml1-transitional.dtd"/>
 </xmlcatalog>

2) Modification of xslt task to use xmlcatalog:
<xslt in="${docs.src}/include/${doc.locale}/header.xml"
out="${build.docs}/${doc.locale}/include/header.html"
style="${docs.src}/style/nutch-header.xsl">
<xmlcatalog refid="docDTDs" />
</xslt>

If no  objections would be raised against it I will prepare a patch next
week.
Regards,
Piotr

Andrzej Bialecki wrote:
Piotr Kosiorowski wrote:

Hello,
During war file generation xslt task needs access to DTDs from
Internet. So you have to have a direct Internet connection (no
proxies) to perform a build - I have modified ant task to use local
versions of DTD's but do not know if it can/should be integrated with
standard nutch build. It will require adding 4 files (DTD and related)
to SVN.
If some commiters do agree with this change I can provied a patch
for it.
Regards
Piotr


Assuming these DTDs and schemas are covered by ASL-compatible licenses,
I think this might be a good idea.




Reply via email to