Hello,
I investigated an idea of including some W3C DTD's in nutch SVN to allow building nutch without Internet connection. Before preparing a formal patch I want to share my findings to make sure the whole idea makes sense.
There are 4 files to include:
1. http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
( Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio), All Rights Reserved. ) - so in my opinion it is ok to include it in SVN as it is MIT/W3C license
2. http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 3. http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 4. http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
All 3 files have following license:
Portions (C) International Organization for Standardization 1986:
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.As it is used by MIT/W3C copyrighted document I assume it is safe to include it nutch SVN - am I correct?
So if all legal issues are resolved I would like to propose to add xmlcatalog directory in src - containing all 4 files. Than following modification would be required in build.xml
1) Definition of xmlcatalog:
<xmlcatalog id="docDTDs">
<dtd publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
location="${basedir}/xmlcatalog/xhtml1-transitional.dtd"/>
</xmlcatalog>2) Modification of xslt task to use xmlcatalog:
<xslt in="${docs.src}/include/${doc.locale}/header.xml"
out="${build.docs}/${doc.locale}/include/header.html"
style="${docs.src}/style/nutch-header.xsl">
<xmlcatalog refid="docDTDs" />
</xslt>If no objections would be raised against it I will prepare a patch next week. Regards, Piotr
Andrzej Bialecki wrote:
for it.Piotr Kosiorowski wrote:
Hello,
During war file generation xslt task needs access to DTDs from
Internet. So you have to have a direct Internet connection (no
proxies) to perform a build - I have modified ant task to use local
versions of DTD's but do not know if it can/should be integrated with
standard nutch build. It will require adding 4 files (DTD and related)
to SVN.
If some commiters do agree with this change I can provied a patch
Regards Piotr
Assuming these DTDs and schemas are covered by ASL-compatible licenses, I think this might be a good idea.
