[ 
https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705085#comment-14705085
 ] 

Yaniv Kunda commented on TIKA-1710:
-----------------------------------

As much as I like Guava (the library, not the fruit) its only use was its 
com.google.common.baseCharsets class, containing constants for the Charset 
instances of the standard charsets - same as in Java's StandardCharsets.
When I replaced this with the static imports of StandardCharsets, there was no 
use left.

Regarding TaggedInputStream, I wasn't sure what to do - this wrap/cast method 
was a modification of the original commons-io code, and it was used only once - 
in RFC822Parser.
I think it's a nice-to-have optimization helper method but nothing more - as it 
only saves the cost of a new TaggedInputStream when the source InputStream is 
already a TaggedInputStream: the checked tag will behave the same way in the 
same wrap-try-catch flow.
The only other usage of TaggedInputStream in tika (besides by TikaInputStream) 
is in RTFParser, by using the constructor directly, is actually an empty usage 
- the TaggedInputStream is constructed and checked in the catch clause, but it 
is not used in the try block at all: the underlying stream does!

Since almost all of tika uses TikaInputStream (which has an advanced version of 
this helper, ensuring bufferism), my opinion is to refrain from adding a helper 
method and simply use the constructor directly, for simplicity. 

> Replace usages of classes in org.apache.tika.io with current alternatives
> -------------------------------------------------------------------------
>
>                 Key: TIKA-1710
>                 URL: https://issues.apache.org/jira/browse/TIKA-1710
>             Project: Tika
>          Issue Type: Improvement
>          Components: batch, cli, core, example, gui, parser, server, 
> translation
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>         Attachments: TIKA-1710.patch
>
>
> Many of the classes in org.apache.tika.io were inlined from commons-io in 
> TIKA-249, but these days most components use commons-io anyway, so in order 
> to clean the dependencies on org.apache.tika.io in preparation of adding 
> commons-io to tika-core, the following can be done:
> - Replace usages of classes in org.apache.tika.io within non-core components 
> with the corresponding classes in commons-io
> - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with 
> java.nio.charset.StandardCharsets.UTF_8 (in all components, including 
> tika-core)
> - Replace other uses of String encoding names of standard charsets with their 
> corresponding Charsets instances from StandardCharsets (this is logically 
> related to IOUtils as these constants should have been there as UTF_8 was 
> before Java 7)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to