Right. Use Path instead of File. From: [email protected] [mailto:[email protected]] Sent: Monday, July 11, 2016 3:42 AM To: Allison, Timothy B. <[email protected]> Cc: [email protected] Subject: RE: TIKA-1164
Hi Timothy, Thanks When I use directly TikaInputStream.get(), it's fine but this method is deprecated in Tika 1.13 and it seems remove in Tika 2.0. Regards Samuel Catherine Intervenant pour le compte de la Direction Informatique [email protected]<mailto:[email protected]> +377 98 98 48 93 [Inactive hide details for "Allison, Timothy B." ---08/07/2016 21:26:26---Y, this makes sense. Detector detector = TikaC]"Allison, Timothy B." ---08/07/2016 21:26:26---Y, this makes sense. Detector detector = TikaConfig.getDefaultConfig().getDetector(); De : "Allison, Timothy B." <[email protected]<mailto:[email protected]>> A : "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date : 08/07/2016 21:26 Objet : RE: TIKA-1164 ________________________________ Y, this makes sense. Detector detector = TikaConfig.getDefaultConfig().getDetector(); File file = new File("testPDFVarious.pdf"); try (FileInputStream is = new FileInputStream(file)) { try (InputStream tis = TikaInputStream.get(is)) { System.out.println("length: " + file.length()); System.out.println("avail before: " + tis.available()); System.out.println("DETECTED: " + detector.detect(tis, new Metadata())); System.out.println("avail after tis: " + tis.available()); System.out.println("avail after is: " + is.available()); } } length: 205491 avail before: 205491 DETECTED: application/pdf avail after tis: 205491 avail after is: 139955 The original input stream is not buffered, and so there is no way to reset it, so y, the detector has to read quite a few bytes to do detection. Note, though, that the TikaInputStream or even a BufferedInputStream will be correctly reset and will have all bytes available. Btw, it is better to call TikaInputStream.get() directly on the file. If a parser needs to copy the original inputstream to a temp file, it can avoid that copy, if you've created your TikaInputSTream directly from the file. TikaInputStream tis = TikaInputStream.get(file) -----Original Message----- From: Chris Mattmann [mailto:[email protected]] Sent: Friday, July 8, 2016 10:14 AM To: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: Re: TIKA-1164 Hi Samuel, I myself haven’t had a chance to look into this yet - maybe someone else on the dev list? Cheers, Chris On 7/8/16, 5:33 AM, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> wrote: >Hi, > >Excuse me to this mail but have you seen my problem ? > >Regards, > >Samuel Catherine > > > >Samuel > CATHERINE---05/07/2016 10:31:31---Hi Chris, Ok thanks for the forward. > >De : Samuel CATHERINE/Monaco-Gouvernement/MC A : "Mattmann, Chris A >(3980)" ><[email protected]<mailto:[email protected]>>@MCGOUV >Cc : "[email protected]<mailto:[email protected]>" ><[email protected]<mailto:[email protected]>> Date : 05/07/2016 >10:31 Objet : Re: TIKA-1164 > >________________________________________ > > >Hi Chris, > >Ok thanks for the forward. >To help you, when I work only with InputStream (like Rest Service), I haven't >got the problem. >The case become when i used a File converted in FileInputStream. > >FileInputStream content=new FileInputStream(file); > >content.avalailable() >//is ok after definition but is ko after the >detector.detect(TikaInputStream.get(content),md) > >Regards, > >Samuel Catherine > > > > >"Mattmann, > Chris A (3980)" ---04/07/2016 17:45:47---Hi Samuel I am forwarding your email > to [email protected]<mailto:[email protected]> and moving > [email protected]<mailto:[email protected]> to BCC. > >De : "Mattmann, Chris A (3980)" ><[email protected]<mailto:[email protected]>> A : >"[email protected]<mailto:[email protected]>" ><[email protected]<mailto:[email protected]>> Cc : >"[email protected]<mailto:[email protected]>" ><[email protected]<mailto:[email protected]>> Date : 04/07/2016 17:45 >Objet : Re: TIKA-1164 ________________________________________ > > > >Hi Samuel I am forwarding your email to [email protected]<mailto:[email protected]> and >moving >[email protected]<mailto:[email protected]> to BCC. > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) NASA Jet >Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected]<mailto:[email protected]> >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Director, Information Retrieval and Data Science Group (IRDS) Adjunct >Associate Professor, Computer Science Department University of Southern >California, Los Angeles, CA 90089 USA >WWW: http://irds.usc.edu/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > >On 7/4/16, 8:41 AM, "[email protected]<mailto:[email protected]>" ><[email protected]<mailto:[email protected]>> wrote: > >>Hi, >> >>I use Tika to detect MediaType and i have the same problem than the >>JIRA TIKA-1164 >>https://issues.apache.org/jira/browse/TIKA-1164?page=com.atlassian.jir >>a.plugin.system.issuetabpanels:all-tabpanel >>But I use the version 1.13. How can I solve this problem, please ? >> >>MediaType mediaType=null; >> Metadata md = >>new Metadata(); >> md.set(Metadata.RESOURCE_NAME_KEY, >>fileName); >> Detector detector = >>TikaConfig.getDefaultConfig().getDetector(); >> >> try { >> mediaType = >>detector.detect(TikaInputStream.get(content), >>md); >> >> } catch (IOException >>e) { >> >> mediaType = >>null; >> } >> >>The contentsize (content.available()) change between before and after the >>detect call. >> >>Regards, >> >>Samuel Catherine >> >> >
