Right.  Use Path instead of File.

From: [email protected] [mailto:[email protected]]
Sent: Monday, July 11, 2016 3:42 AM
To: Allison, Timothy B. <[email protected]>
Cc: [email protected]
Subject: RE: TIKA-1164


Hi Timothy,

Thanks

When I use directly TikaInputStream.get(), it's fine but this method is 
deprecated in Tika 1.13 and it seems remove in Tika 2.0.

Regards

Samuel Catherine
Intervenant pour le compte de la Direction Informatique
[email protected]<mailto:[email protected]>
+377 98 98 48 93


[Inactive hide details for "Allison, Timothy B." ---08/07/2016 21:26:26---Y, 
this makes sense.         Detector detector = TikaC]"Allison, Timothy B." 
---08/07/2016 21:26:26---Y, this makes sense.         Detector detector = 
TikaConfig.getDefaultConfig().getDetector();

De : "Allison, Timothy B." <[email protected]<mailto:[email protected]>>
A : "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date : 08/07/2016 21:26
Objet : RE: TIKA-1164

________________________________



Y, this makes sense.

       Detector detector = TikaConfig.getDefaultConfig().getDetector();
       File file = new File("testPDFVarious.pdf");
       try (FileInputStream is = new FileInputStream(file)) {
           try (InputStream tis = TikaInputStream.get(is)) {
               System.out.println("length: " + file.length());
               System.out.println("avail before: " + tis.available());
               System.out.println("DETECTED: " + detector.detect(tis, new 
Metadata()));
               System.out.println("avail after tis: " + tis.available());
               System.out.println("avail after is: " + is.available());
           }
       }

length: 205491
avail before: 205491
DETECTED: application/pdf
avail after tis: 205491
avail after is: 139955

The original input stream is not buffered, and so there is no way to reset it, 
so y, the detector has to read quite a few bytes to do detection.

Note, though, that the TikaInputStream or even a BufferedInputStream will be 
correctly reset and will have all bytes available.

Btw, it is better to call TikaInputStream.get() directly on the file.  If a 
parser needs to copy the original inputstream to a temp file, it can avoid that 
copy, if you've created your TikaInputSTream directly from the file.

TikaInputStream tis = TikaInputStream.get(file)

-----Original Message-----
From: Chris Mattmann [mailto:[email protected]]
Sent: Friday, July 8, 2016 10:14 AM
To: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: Re: TIKA-1164

Hi Samuel,

I myself haven’t had a chance to look into this yet - maybe someone else on the 
dev list?

Cheers,
Chris




On 7/8/16, 5:33 AM, "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>> wrote:

>Hi,
>
>Excuse me to this mail but have you seen my problem ?
>
>Regards,
>
>Samuel Catherine
>
>
>
>Samuel
> CATHERINE---05/07/2016 10:31:31---Hi Chris, Ok thanks for the forward.
>
>De : Samuel CATHERINE/Monaco-Gouvernement/MC A : "Mattmann, Chris A
>(3980)" 
><[email protected]<mailto:[email protected]>>@MCGOUV
>Cc : "[email protected]<mailto:[email protected]>" 
><[email protected]<mailto:[email protected]>> Date : 05/07/2016
>10:31 Objet : Re: TIKA-1164
>
>________________________________________
>
>
>Hi Chris,
>
>Ok thanks for the forward.
>To help you, when I work only with InputStream (like Rest Service), I haven't 
>got the problem.
>The case become when i used a File converted in FileInputStream.
>
>FileInputStream content=new FileInputStream(file);
>
>content.avalailable()
>//is ok after definition but is ko after the
>detector.detect(TikaInputStream.get(content),md)
>
>Regards,
>
>Samuel Catherine
>
>
>
>
>"Mattmann,
> Chris A (3980)" ---04/07/2016 17:45:47---Hi Samuel I am forwarding your email 
> to [email protected]<mailto:[email protected]> and moving 
> [email protected]<mailto:[email protected]> to BCC.
>
>De : "Mattmann, Chris A (3980)" 
><[email protected]<mailto:[email protected]>> A :
>"[email protected]<mailto:[email protected]>" 
><[email protected]<mailto:[email protected]>> Cc :
>"[email protected]<mailto:[email protected]>" 
><[email protected]<mailto:[email protected]>> Date : 04/07/2016 17:45
>Objet : Re: TIKA-1164 ________________________________________
>
>
>
>Hi Samuel I am forwarding your email to [email protected]<mailto:[email protected]> and 
>moving
>[email protected]<mailto:[email protected]> to BCC.
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]<mailto:[email protected]>
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Director, Information Retrieval and Data Science Group (IRDS) Adjunct
>Associate Professor, Computer Science Department University of Southern
>California, Los Angeles, CA 90089 USA
>WWW: http://irds.usc.edu/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
>On 7/4/16, 8:41 AM, "[email protected]<mailto:[email protected]>" 
><[email protected]<mailto:[email protected]>> wrote:
>
>>Hi,
>>
>>I use Tika to detect MediaType and i have the same problem than the
>>JIRA TIKA-1164
>>https://issues.apache.org/jira/browse/TIKA-1164?page=com.atlassian.jir
>>a.plugin.system.issuetabpanels:all-tabpanel
>>But I use the version 1.13. How can I solve this problem, please ?
>>
>>MediaType mediaType=null;
>>        Metadata md =
>>new Metadata();
>>        md.set(Metadata.RESOURCE_NAME_KEY,
>>fileName);
>>        Detector detector =
>>TikaConfig.getDefaultConfig().getDetector();
>>
>>        try {
>>            mediaType =
>>detector.detect(TikaInputStream.get(content),
>>md);
>>
>>        } catch (IOException
>>e) {
>>
>>            mediaType =
>>null;
>>        }
>>
>>The contentsize (content.available()) change between before and after the 
>>detect call.
>>
>>Regards,
>>
>>Samuel Catherine
>>
>>
>

Reply via email to