[
https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725977#comment-17725977
]
Tim Allison commented on TIKA-4004:
-----------------------------------
[^000000.warc] is the result of {{curl -r 52967301-53010202
https://data.commoncrawl.org/crawl-data/CC-MAIN-2023-06/segments/1674764499831.97/warc/CC-MAIN-20230130232547-20230131022547-00296.warc.gz
-o 000000.warc.gz}}
So, y, CC fetched some different bytes than I'm currently refetching from the
source sites.
> font/otf application/vnd.ms-opentype
> ------------------------------------
>
> Key: TIKA-4004
> URL: https://issues.apache.org/jira/browse/TIKA-4004
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Priority: Major
> Attachments: 000000.warc, aller-bold.eot, aller-light.eot,
> fleurons.eot, index.html_id=45_and_type=eot, index.html_id=67_and_type=eot,
> index.html_id=75_and_type=eot, index.html_id=77_and_type=eot,
> index.html_id=80_and_type=eot, index.html_id=83_and_type=eot,
> index.html_id=84_and_type=eot
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)