Hi Tim,
Absolutely, see TIKA-2244. This PR primarily helps by detecting when a
CloseShieldInputStream supports mark, the previous mechanism was to detect if
the class of InputStream was one of a couple known to support it, but trusting
markSupported seems to work quite well and avoid the false negatives that were
causing unnecessary BufferedInputStream allocations. While I was looking I also
found similar issues in PackageParser and Compressor parser so I went ahead and
used the same logic in those places.
Thanks,
Josh
> On Jan 18, 2017, at 1:51 PM, Allison, Timothy B. <[email protected]> wrote:
>
> Josh,
> Thank you for this PR. Would you be able to open an issue on our JIRA as
> well. Can you explain in a bit more detail how this patch helps?
> Thank you, again.
>
> Best,
>
> Tim
>
> -----Original Message-----
> From: joshbooks [mailto:[email protected]]
> Sent: Wednesday, January 18, 2017 4:15 PM
> To: [email protected]
> Subject: [GitHub] tika pull request #148: be more parsimonious wrapping
> streams
>
> GitHub user joshbooks opened a pull request:
>
> https://github.com/apache/tika/pull/148
>
> be more parsimonious wrapping streams
>
> it looks like a bunch of streams were getting wrapped in
> BufferedInputStreams just to make extra double sure that mark was supported,
> but this is not as harmless as it might otherwise seem when you run into big
> nested package files
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/joshbooks/tika master
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/tika/pull/148.patch
>
> To close this pull request, make a commit to your master/trunk branch with
> (at least) the following in the commit message:
>
> This closes #148
>
> ----
> commit 896c46a0c652de436da0e4f25bfa53a7d83ae02f
> Author: Joshua Hight <[email protected]>
> Date: 2017-01-18T21:10:03Z
>
> be more parsimonious wrapping streams
>
> it looks like a bunch of streams were getting wrapped in
> BufferedInputStream just to make extra double sure that mark was
> supported, but this is not as harmless as it might otherwise seem when
> you run into big nested package files
>
> commit 9477d03e10149a8ec6b5d6889e2fd2317d2ed5f5
> Author: Joshua Hight <[email protected]>
> Date: 2017-01-18T21:13:05Z
>
> Merge remote-tracking branch 'apache/master'
>
> ----
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at [email protected] or file a JIRA ticket
> with INFRA.
> ---