[
https://issues.apache.org/jira/browse/TIKA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886977#comment-15886977
]
Tim Allison edited comment on TIKA-2272 at 2/28/17 1:30 AM:
------------------------------------------------------------
bq. n.b. The CRC ought to be of the content segment of the file only. It should
not include the metadata block.
We're currently digesting the inputstream before the parse...e.g. the raw
bytes. As you know, given the variety of file formats we parse, there isn't a
consistent "content" vs. "metadata" block within files. Are you thinking of
HTML, perhaps?
Or, are you asking for a CRC32 on the extracted text? If so, that would be
better handled as a handler.
Can you give more details of what, exactly, you need? Thank you.
was (Author: [email protected]):
bq. n.b. The CRC ought to be of the content segment of the file only. It should
not include the metadata block.
We're currently digesting the inputstream before the parse...e.g. the raw
bytes. As you know, given the variety of file formats we parse, there isn't a
consistent "content" vs. "metadata" block with files.
Or, are you asking for a CRC32 on the extracted text? If so, that would be
better handled as a handler.
Can you give more details of what, exactly, you need? Thank you.
> Add CRC32 option to DigestingParser
> -----------------------------------
>
> Key: TIKA-2272
> URL: https://issues.apache.org/jira/browse/TIKA-2272
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.14
> Reporter: Jason (at Wshrdryr)
> Priority: Minor
> Fix For: 1.14
>
>
> DigestingParser currently supports a half dozen kinds of hash generation as a
> configurable option.
> Please add CRC32 to the list.
> n.b. The CRC ought to be of the content segment of the file only. It should
> not include the metadata block.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)