[jira] [Commented] (TIKA-1351) Parser implementations should accept null content handlers

Alexey Pismenskiy (Jira) Mon, 18 Sep 2023 16:23:05 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766606#comment-17766606
 ]


Alexey Pismenskiy commented on TIKA-1351:
-----------------------------------------

Any update on this ticket? 

Previous comment mentions that there is a dummy ContentHandler - what is the 
name of this class?. 

But It would be nice to ONLY extract metadata and do not waste the resources to 
parse the content in some cases. 

> Parser implementations should accept null content handlers
> ----------------------------------------------------------
>
>                 Key: TIKA-1351
>                 URL: https://issues.apache.org/jira/browse/TIKA-1351
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Sergey Beryozkin
>            Priority: Minor
>
> Applications which want to let users search documents based only on their 
> metadata do not need to get the content parsed. 
> The only workaround I've found so far is to pass a no op content handler 
> which can ignore the content events but it does not stop the parser such as 
> PDFParser from parsing the content.
> Proposal: update parser API docs to let implementers know ContentHandler can 
> be null and update the shipped implementations to parse the metadata only if 
> ContentHandler is null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TIKA-1351) Parser implementations should accept null content handlers

Reply via email to