[ 
https://issues.apache.org/jira/browse/TIKA-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725896#comment-17725896
 ] 

Gregory Lepore commented on TIKA-4005:
--------------------------------------

Mostly correct, it looks like the signature wants 00 08 at offset 0 (B:0 seq 
"\x00\b") plus some other stuff before and after the text strings, but just 
looking for RSFTSTYL or ENDNENFT at offset 8 is probably sufficient.

 

The only issue I've encountered with string/offset identifications is that a 
text file with either of those strings at offset 8 will match, possibly adding 
the 00 08 at offset 0 will fix that.

 

Looking at: 
https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1069&strPageToDisplay=signatures

has the full wildcard signature:

0008(00|FF)(FF|00)0000(00|10)(10|00)525346545354594C(00100100|10000001)

 

translates to:

 

0008(then either 00 or FF) then 0000(then either 00 or 10)(then either 10 or 
00) then 525346545354594C then(either 00100100 or 10000001).

> application/x-endnote-style
> ---------------------------
>
>                 Key: TIKA-4005
>                 URL: https://issues.apache.org/jira/browse/TIKA-4005
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to