[
https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705085#comment-14705085
]
Yaniv Kunda commented on TIKA-1710:
-----------------------------------
As much as I like Guava (the library, not the fruit) its only use was its
com.google.common.baseCharsets class, containing constants for the Charset
instances of the standard charsets - same as in Java's StandardCharsets.
When I replaced this with the static imports of StandardCharsets, there was no
use left.
Regarding TaggedInputStream, I wasn't sure what to do - this wrap/cast method
was a modification of the original commons-io code, and it was used only once -
in RFC822Parser.
I think it's a nice-to-have optimization helper method but nothing more - as it
only saves the cost of a new TaggedInputStream when the source InputStream is
already a TaggedInputStream: the checked tag will behave the same way in the
same wrap-try-catch flow.
The only other usage of TaggedInputStream in tika (besides by TikaInputStream)
is in RTFParser, by using the constructor directly, is actually an empty usage
- the TaggedInputStream is constructed and checked in the catch clause, but it
is not used in the try block at all: the underlying stream does!
Since almost all of tika uses TikaInputStream (which has an advanced version of
this helper, ensuring bufferism), my opinion is to refrain from adding a helper
method and simply use the constructor directly, for simplicity.
> Replace usages of classes in org.apache.tika.io with current alternatives
> -------------------------------------------------------------------------
>
> Key: TIKA-1710
> URL: https://issues.apache.org/jira/browse/TIKA-1710
> Project: Tika
> Issue Type: Improvement
> Components: batch, cli, core, example, gui, parser, server,
> translation
> Reporter: Yaniv Kunda
> Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1710.patch
>
>
> Many of the classes in org.apache.tika.io were inlined from commons-io in
> TIKA-249, but these days most components use commons-io anyway, so in order
> to clean the dependencies on org.apache.tika.io in preparation of adding
> commons-io to tika-core, the following can be done:
> - Replace usages of classes in org.apache.tika.io within non-core components
> with the corresponding classes in commons-io
> - Replace usages of org.apache.tika.io.IOUtils.UTF_8 with
> java.nio.charset.StandardCharsets.UTF_8 (in all components, including
> tika-core)
> - Replace other uses of String encoding names of standard charsets with their
> corresponding Charsets instances from StandardCharsets (this is logically
> related to IOUtils as these constants should have been there as UTF_8 was
> before Java 7)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)