Thanks Konstantin and Gabriele! Please feel free to email any other
questions or open an issue on the Tika JIRA.

Have a good day!
Tyler
On Jan 29, 2015 11:43 AM, "Gabriele Guidi" <[email protected]> wrote:

> Ok, thank you for your support
>
> Best regards
>
> 2015-01-29 15:14 GMT+01:00 Konstantin Gribov <[email protected]>:
>
> > Hi, Gabriele.
> >
> > If you're using InputStream which doesn't support mark/reset tika facade
> > (org.apache.Tika) creates BufferedInputStream which consumes up to 8k of
> > original inputStream by default, so Tika mime type detector can't find
> pdf
> > magic after first call.
> >
> > Second case (with copying to byte[]) is similar. If you do this copy
> > before calling tika.detect, you consume that input stream and subsequent
> > calls on that stream return application/octet-stream as default
> mime-type.
> > But all works fine with bytes since you have full copy of original stream
> > in it.
> >
> > If you call tika.detect on input stream before copying it to bytes it
> > falls to first case, you'll copy inputstream without first 8k to it, so
> > drop pdf magic.
> >
> > You have to recreate input stream, copy it somewhere to temporary
> resource
> > (as with bytes or some temp file) or wrap it to BufferedInputStream
> before
> > passing it to tika.detect.
> >
> > --
> > Best regards,
> > Konstantin Gribov
> >
> > Thu Jan 29 2015 at 16:07:12, Gabriele Guidi <[email protected]>:
> >
> > Hi
> >>
> >> No, I ask it with "*markSupported
> >> <
> http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#markSupported()
> >*
> >> ()" function and it says "NO".
> >> No recreation.
> >>
> >> The code test is very simple:
> >>
> >> InputStream inputsbust = content.getContentStream();
> >> System.out.println(" mark and reset inputStream ?
> >> "+(inputsbust.markSupported()?"YES":"NO"));
> >> System.out.println(" 1 mime : " + tika.detect(inputsbust));
> >> System.out.println(" 2 mime : " + tika.detect(inputsbust));
> >> byte[] bytes = IOUtils.toByteArray(inputsbust);
> >> System.out.println(" 3 mime : " + tika.detect(bytes));
> >> System.out.println(" 3.2 mime : " + tika.detect(bytes));
> >>
> >>
> >> The result:
> >>
> >> mark and reset of inputStream ? NO
> >>
> >>  1 mime : application/pdf
> >>  2 mime : application/octet-stream
> >>  3 mime : application/octet-stream
> >>  3.2 mime : application/octet-stream
> >>
> >>
> >> If i put the 5th line ("byte[] bytes =
> IOUtils.toByteArray(inputsbust);")
> >> as second line the result is:
> >>
> >> mark and reset of inputStream ? NO
> >>
> >>  1 mime : application/octet-stream
> >>  2 mime : application/octet-stream
> >>  3 mime : application/pdf
> >>  3.2 mime : application/pdf
> >>
> >>
> >> I hope it helps
> >>
> >> Thanks
> >>
> >>
> >> 2015-01-29 10:49 GMT+01:00 Konstantin Gribov <[email protected]>:
> >>
> >>> Hi,
> >>>
> >>> Does this InputStream support mark/reset fuctionality? Is InputStream
> >>> recreated before each subsequent call to tika.detect or it called on
> >>> partially consumed stream (in case mark isn't supported)?
> >>>
> >>> --
> >>> Best regards,
> >>> Konstantin Gribov
> >>>
> >>> Thu Jan 29 2015 at 9:25:28, Mattmann, Chris A (3980) <
> >>> [email protected]>:
> >>>
> >>> Dear Gabriele,
> >>>>
> >>>> Thanks for your question. It should be sent to [email protected]
> >>>> (moving [email protected] to BCC).
> >>>>
> >>>> I’ll take a look tomorrow if someone else hasn’t answered yet.
> >>>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Chris Mattmann, Ph.D.
> >>>> Chief Architect
> >>>> Instrument Software and Science Data Systems Section (398)
> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>> Office: 168-519, Mailstop: 168-527
> >>>> Email: [email protected]
> >>>> WWW:  http://sunset.usc.edu/~mattmann/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Adjunct Associate Professor, Computer Science Department
> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Gabriele Guidi <[email protected]>
> >>>> Date: Wednesday, January 28, 2015 at 5:25 AM
> >>>> To: "[email protected]" <[email protected]>
> >>>> Subject: multiple detect call -> different results (tika 1.7)
> >>>>
> >>>> >
> >>>> >
> >>>> >Hi,
> >>>> >
> >>>> >
> >>>> >I found a strange behavior. I have p7m file, then I extract file
> inside
> >>>> >the signed one, after that I use tika to discover mime type, the
> first
> >>>> >call it gives me "application/pdf" (that's correct). BUT every next
> >>>> call
> >>>> >to the detect method of Tika to the
> >>>> > same inputStream gives me "application/octet-stream". ...why?
> >>>> >I cannot understand the behavior ...and find a solution.
> >>>> >
> >>>> >
> >>>> >Just a snipped of code:
> >>>> >
> >>>> >
> >>>> >
> >>>> >InputStream inputsbust = content.getContentStream();
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >System.out.println(" 1 mime " + filepath + " : "
> >>>> >+ tika.detect(inputsbust));
> >>>> >System.out.println(" 2 mime " + filepath + " : "
> >>>> >+ tika.detect(inputsbust));
> >>>> >System.out.println(" 3 mime " + filepath + " : "
> >>>> >+ tika.detect(inputsbust));
> >>>> >
> >>>> >
> >>>> >
> >>>> >Result:
> >>>> >
> >>>> > 1 mime /home/gguidi/01_file.pdf : application/pdf
> >>>> > 2 mime /home/gguidi/01_file.pdf : application/octet-stream
> >>>> > 3 mime /home/gguidi/01_file.pdf : application/octet-stream
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >Thanks
> >>>> >
> >>>> >
> >>>> >--
> >>>> >
> >>>> >
> >>>> >Gabriele Guidi
> >>>> >Direzione Pubblica Amministrazione
> >>>> >[email protected]
> >>>> >
> >>>> >Engineering Ingegneria Informatica spa
> >>>> >Via Marconi, 10 - 40122, Bologna
> >>>> >Tel. +39-051.0435135
> >>>> >www.eng.it <http://www.eng.it>
> >>>> >
> >>>> >
> >>>> >Rispetta l'ambiente. Non stampare questa e-mail se non necessario.
> >>>> >Respect the environment. Please don't print this e-mail unless you
> >>>> really
> >>>> >need to.
> >>>> >Le informazioni trasmesse sono destinate esclusivamente alla persona
> o
> >>>> >alla società in indirizzo e sono da intendersi confidenziali e
> >>>> riservate.
> >>>> >Ogni trasmissione, inoltro, diffusione o altro uso
> >>>> > di queste informazioni a persone o società differenti dal
> >>>> destinatario è
> >>>> >proibita. Se ricevete questa comunicazione per errore, contattate il
> >>>> >mittente e cancellate le informazioni da ogni computer.
> >>>> >The information transmitted is intended only for the person or entity
> >>>> to
> >>>> >which it is addressed and may contain confidential and/or privileged
> >>>> >material. Any review, retransmission, dissemination or other use of,
> or
> >>>> >taking of any action in reliance upon, this
> >>>> > information by persons or entities other than the intended recipient
> >>>> is
> >>>> >prohibited. If you received this in error, please contact the sender
> >>>> and
> >>>> >delete the material from any computer.
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>>
> >>>>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> * Gabriele Guidi*
> >>
> >>
> >>  Direzione Pubblica Amministrazione
> >> [email protected]
> >>
> >> *Engineering Ingegneria Informatica spa*
> >> Via Marconi, 10 - 40122, Bologna
> >>
> >>
> >> Tel. +39-051.0435135
> >>  www.eng.it
> >>
> >>  Rispetta l'ambiente. Non stampare questa e-mail se non necessario.
> >> Respect the environment. Please don't print this e-mail unless you
> really
> >> need to.
> >>
> >> Le informazioni trasmesse sono destinate esclusivamente alla persona o
> >> alla società in indirizzo e sono da intendersi confidenziali e
> riservate.
> >> Ogni trasmissione, inoltro, diffusione o altro uso di queste
> informazioni a
> >> persone o società differenti dal destinatario è proibita. Se ricevete
> >> questa comunicazione per errore, contattate il mittente e cancellate le
> >> informazioni da ogni computer.
> >> The information transmitted is intended only for the person or entity to
> >> which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipient is prohibited. If you
> received
> >> this in error, please contact the sender and delete the material from
> any
> >> computer.
> >>
> >
>
>
> --
>
>
>
> * Gabriele Guidi*
>  Direzione Pubblica Amministrazione
> [email protected]
>
> *Engineering Ingegneria Informatica spa*
> Via Marconi, 10 - 40122, Bologna
> Tel. +39-051.0435135
>  www.eng.it
>
>  Rispetta l'ambiente. Non stampare questa e-mail se non necessario.
> Respect the environment. Please don't print this e-mail unless you really
> need to.
>
> Le informazioni trasmesse sono destinate esclusivamente alla persona o alla
> società in indirizzo e sono da intendersi confidenziali e riservate. Ogni
> trasmissione, inoltro, diffusione o altro uso di queste informazioni a
> persone o società differenti dal destinatario è proibita. Se ricevete
> questa comunicazione per errore, contattate il mittente e cancellate le
> informazioni da ogni computer.
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>

Reply via email to