[
https://issues.apache.org/jira/browse/TIKA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409192#comment-16409192
]
Andreas Meier commented on TIKA-2609:
-------------------------------------
Emacs 18 and earlier testfiles can be found under
https://github.com/larsbrinkhoff/emacs-16.56
(the .elc files are emacs 16, but the structure of emacs 18 and 16 should be
the same)
> Refine Emacs Lisp file recognition (.elc)
> -----------------------------------------
>
> Key: TIKA-2609
> URL: https://issues.apache.org/jira/browse/TIKA-2609
> Project: Tika
> Issue Type: Improvement
> Components: core
> Reporter: Andreas Meier
> Priority: Minor
>
> Some newer .elc files are not recognized properly by the current matcher.
> (Tested with emacs 24.4 files from
> [https://github.com/jwiegley/emacs-release/tree/master/lisp])
> I created a regex that should handle these files similar to the linux magic:
> {code:java}
> # Emacs 18 - this is always correct, but not very magical.
> 0 string \012( Emacs v18 byte-compiled Lisp data
> !:mime application/x-elc
> # Emacs 19+ - ver. recognition added by Ian Springer
> # Also applies to XEmacs 19+ .elc files; could tell them apart with regexs
> # - Chris Chittleborough <[email protected]>
> 0 string ;ELC
> >4 byte >18
> >4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data
> !:mime application/x-elc{code}
> {code:xml}
> <mime-type type="application/x-elc">
> <_comment>Emacs Lisp bytecode</_comment>
> <magic priority="50">
> <!-- Emacs 18 -->
> <match value="\012(" type="string" offset="0" />
> <!-- Emacs 19 -->
> <match value=";ELC" type="string" offset="0" >
> <match value="[\\x13-\\x1F]" type="regex" offset="4"/>
> </match>
> </magic>
> <glob pattern="*.elc"/>
> </mime-type>
> {code}
> Please verify the hexvalues before committing.
>
> Regards
>
> Andreas
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)