[ 
https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725974#comment-17725974
 ] 

Tim Allison commented on TIKA-4004:
-----------------------------------

The files from common crawl are weird.  When I refetch the files, I get either 
html pages or files that start with html and then have some binary font info 
(e.g. https://schrift.mono.lt/font/download/id/217 or 
http://www.awebfont.ir/fonts?cat_id=13&file_id=1821&file_type=otf).  We can try 
to refetch the bytes to see if CC actually pulled a font file along the lines 
of what [[email protected]] supplied.

> font/otf application/vnd.ms-opentype
> ------------------------------------
>
>                 Key: TIKA-4004
>                 URL: https://issues.apache.org/jira/browse/TIKA-4004
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: aller-bold.eot, aller-light.eot, fleurons.eot
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to