For what it's worth, I confirmed that Heritrix (Internet Archive's crawling tool) produces WARC files without the angle brackets for WARC-Target-URI.
Best regards, William Prescott On Tue, Nov 14, 2017 at 11:45 PM, William Prescott <[email protected]> wrote: > Hello, > > It seems that there may be some ambiguity in the WARC standard > regarding the usage of angle brackets surrounding the URI given for a > WARC-Target-URI field. > In short, while the BNF grammar includes the brackets, the examples > presented in the standard do not. It would appear that tools have been > built to assume the lack of brackets, and may have issues when they > are present (this is how I learned about this.) > > There is some discussion about this here: > https://github.com/iipc/warc-specifications/issues/23 > https://github.com/iipc/warc-specifications/pull/24 > > One commenter states that the brackets have been removed in a newer > draft. I see that a new standard (ISO 28500:2017) was published in > August, but I don't have access to confirm if it says anything about > this. > > A Wget bug report ( http://savannah.gnu.org/bugs/?47281 ) had been > submitted which resulted in the addition of the brackets in commit > 100da11312a1781a3d5aa38760ce0e8bd9384659. An additional commit > (63c2aea2557b84640272629c7dc0caccab66ab6d) expanded the usage of > brackets to more block types which contained WARC-Target-URI -- this > was mentioned in > http://lists.gnu.org/archive/html/bug-wget/2017-03/msg00006.html . The > specification PDF referenced in the report and mailing list post > contains the error (see "uri" in the grammar on page 5, and > "WARC-Target-URI" in the example on page 22 [C.2]) > > > Given the unclear nature of this aspect of the standard, I don't know > exactly what action to suggest, but I did want to point it out. > > Best regards, > William Prescott
