[
https://issues.apache.org/jira/browse/TIKA-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725896#comment-17725896
]
Gregory Lepore commented on TIKA-4005:
--------------------------------------
Mostly correct, it looks like the signature wants 00 08 at offset 0 (B:0 seq
"\x00\b") plus some other stuff before and after the text strings, but just
looking for RSFTSTYL or ENDNENFT at offset 8 is probably sufficient.
The only issue I've encountered with string/offset identifications is that a
text file with either of those strings at offset 8 will match, possibly adding
the 00 08 at offset 0 will fix that.
Looking at:
https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1069&strPageToDisplay=signatures
has the full wildcard signature:
0008(00|FF)(FF|00)0000(00|10)(10|00)525346545354594C(00100100|10000001)
translates to:
0008(then either 00 or FF) then 0000(then either 00 or 10)(then either 10 or
00) then 525346545354594C then(either 00100100 or 10000001).
> application/x-endnote-style
> ---------------------------
>
> Key: TIKA-4005
> URL: https://issues.apache.org/jira/browse/TIKA-4005
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)