[
https://issues.apache.org/jira/browse/TIKA-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gregory Lepore updated TIKA-4083:
---------------------------------
Description:
The ClamAV CDIFF format appears 1,582 times in the latest Common Crawl dataset.
No known mime type.
The magic is 436C616D41562D44696666 at offset 0 (ClamAV-Diff in ASCII).
File extension is .cdiff.
[https://blog.clamav.net/2021/03/clamav-cvds-cdiffs-and-magic-behind.html]
was:
The ClamAV CDIFF format appears 1,582 times in the latest Common Crawl dataset.
No known mime type.
The magic is 436C616D41562D44696666 at offset 0 (ClamAV-Diff in ASCII).
https://blog.clamav.net/2021/03/clamav-cvds-cdiffs-and-magic-behind.html
> Add magic for ClamAV CDiff files
> --------------------------------
>
> Key: TIKA-4083
> URL: https://issues.apache.org/jira/browse/TIKA-4083
> Project: Tika
> Issue Type: Sub-task
> Reporter: Gregory Lepore
> Priority: Minor
> Attachments:
> 0a0f28d9d03c84aaa97a996719c97663c1de40b7d0d710140fd47676a89cfcaa,
> 0a55b4b748ff9d0f542f2a8fb4ee9462d7ff299063a5ff9c653b74882a510d35,
> 0a8b3069b57c0069d99149c5296fc001d9fa8c8f88e0f865940dccc99c1af5c1
>
>
> The ClamAV CDIFF format appears 1,582 times in the latest Common Crawl
> dataset. No known mime type.
> The magic is 436C616D41562D44696666 at offset 0 (ClamAV-Diff in ASCII).
> File extension is .cdiff.
>
> [https://blog.clamav.net/2021/03/clamav-cvds-cdiffs-and-magic-behind.html]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)