Y, this makes sense.
Detector detector = TikaConfig.getDefaultConfig().getDetector();
File file = new File("testPDFVarious.pdf");
try (FileInputStream is = new FileInputStream(file)) {
try (InputStream tis = TikaInputStream.get(is)) {
System.out.println("length: " + file.length());
System.out.println("avail before: " + tis.available());
System.out.println("DETECTED: " + detector.detect(tis, new
Metadata()));
System.out.println("avail after tis: " + tis.available());
System.out.println("avail after is: " + is.available());
}
}
length: 205491
avail before: 205491
DETECTED: application/pdf
avail after tis: 205491
avail after is: 139955
The original input stream is not buffered, and so there is no way to reset it,
so y, the detector has to read quite a few bytes to do detection.
Note, though, that the TikaInputStream or even a BufferedInputStream will be
correctly reset and will have all bytes available.
Btw, it is better to call TikaInputStream.get() directly on the file. If a
parser needs to copy the original inputstream to a temp file, it can avoid that
copy, if you've created your TikaInputSTream directly from the file.
TikaInputStream tis = TikaInputStream.get(file)
-----Original Message-----
From: Chris Mattmann [mailto:[email protected]]
Sent: Friday, July 8, 2016 10:14 AM
To: [email protected]; [email protected]
Subject: Re: TIKA-1164
Hi Samuel,
I myself haven’t had a chance to look into this yet - maybe someone else on the
dev list?
Cheers,
Chris
On 7/8/16, 5:33 AM, "[email protected]" <[email protected]> wrote:
>Hi,
>
>Excuse me to this mail but have you seen my problem ?
>
>Regards,
>
>Samuel Catherine
>
>
>
>Samuel
> CATHERINE---05/07/2016 10:31:31---Hi Chris, Ok thanks for the forward.
>
>De : Samuel CATHERINE/Monaco-Gouvernement/MC A : "Mattmann, Chris A
>(3980)" <[email protected]>@MCGOUV
>Cc : "[email protected]" <[email protected]> Date : 05/07/2016
>10:31 Objet : Re: TIKA-1164
>
>________________________________________
>
>
>Hi Chris,
>
>Ok thanks for the forward.
>To help you, when I work only with InputStream (like Rest Service), I haven't
>got the problem.
>The case become when i used a File converted in FileInputStream.
>
>FileInputStream content=new FileInputStream(file);
>
>content.avalailable()
>//is ok after definition but is ko after the
>detector.detect(TikaInputStream.get(content),md)
>
>Regards,
>
>Samuel Catherine
>
>
>
>
>"Mattmann,
> Chris A (3980)" ---04/07/2016 17:45:47---Hi Samuel I am forwarding your email
> to [email protected] and moving [email protected] to BCC.
>
>De : "Mattmann, Chris A (3980)" <[email protected]> A :
>"[email protected]" <[email protected]> Cc :
>"[email protected]" <[email protected]> Date : 04/07/2016 17:45
>Objet : Re: TIKA-1164 ________________________________________
>
>
>
>Hi Samuel I am forwarding your email to [email protected] and moving
>[email protected] to BCC.
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Director, Information Retrieval and Data Science Group (IRDS) Adjunct
>Associate Professor, Computer Science Department University of Southern
>California, Los Angeles, CA 90089 USA
>WWW: http://irds.usc.edu/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
>On 7/4/16, 8:41 AM, "[email protected]" <[email protected]> wrote:
>
>>Hi,
>>
>>I use Tika to detect MediaType and i have the same problem than the
>>JIRA TIKA-1164
>>https://issues.apache.org/jira/browse/TIKA-1164?page=com.atlassian.jir
>>a.plugin.system.issuetabpanels:all-tabpanel
>>But I use the version 1.13. How can I solve this problem, please ?
>>
>>MediaType mediaType=null;
>> Metadata md =
>>new Metadata();
>> md.set(Metadata.RESOURCE_NAME_KEY,
>>fileName);
>> Detector detector =
>>TikaConfig.getDefaultConfig().getDetector();
>>
>> try {
>> mediaType =
>>detector.detect(TikaInputStream.get(content),
>>md);
>>
>> } catch (IOException
>>e) {
>>
>> mediaType =
>>null;
>> }
>>
>>The contentsize (content.available()) change between before and after the
>>detect call.
>>
>>Regards,
>>
>>Samuel Catherine
>>
>>
>