[
https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725974#comment-17725974
]
Tim Allison commented on TIKA-4004:
-----------------------------------
The files from common crawl are weird. When I refetch the files, I get either
html pages or files that start with html and then have some binary font info
(e.g. https://schrift.mono.lt/font/download/id/217 or
http://www.awebfont.ir/fonts?cat_id=13&file_id=1821&file_type=otf). We can try
to refetch the bytes to see if CC actually pulled a font file along the lines
of what [[email protected]] supplied.
> font/otf application/vnd.ms-opentype
> ------------------------------------
>
> Key: TIKA-4004
> URL: https://issues.apache.org/jira/browse/TIKA-4004
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Priority: Major
> Attachments: aller-bold.eot, aller-light.eot, fleurons.eot
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)