[ 
https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013087#comment-16013087
 ] 

Tim Allison commented on TIKA-2360:
-----------------------------------

> Thanks Tim, appreciate it.
Of course!  I'm sorry for moving out on this without giving enough time for 
feedback!

To my mind, 1. would be great.

For 2., I'm happy to leave the SentimentParser as a parser for Tika 1.x as long 
as users are required to turn it on.  Or, is this a sticking point for you?

For Tika 2.0 we should come up with an interface/common way of handling 
post-processing after "text" has been extracted.  We currently have the NER 
parser and the Sentiment parser that require text, but we've also put this 
post-processing functionality into handlers for other things -- the old 
Language id handler and the phone # extractor.

As for the ObjectRecogniser, I think we might want to consider turning that 
into a Parser (at some point) because it handles raw bytes, just like OCR or 
the JPEG parser.  The output could populate Metadata instead of returning a 
list of recognized objects...however, I realize, here, we get back into the 
challenge of arbitrary metadata (TIKA-1607)...because we do want to group the 
object bits together for each object.  In Tika 2.x, this would allow users to 
configure a composite image parser composed of three parsers: metadata 
extraction, OCR and image recognition, and y, it might take 2 minutes per 
image, but the capability would be there...




> Handle SentimentParser resource failure more robustly
> -----------------------------------------------------
>
>                 Key: TIKA-2360
>                 URL: https://issues.apache.org/jira/browse/TIKA-2360
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Blocker
>             Fix For: 1.15
>
>
> The SentimentParser tests currently require a network call to github.  For 
> those working behind a proxy or would prefer Tika not to make unexpected 
> network calls, can we please turn this off by default?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to