Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Gabriele Guidi
Ok, thank you for your support Best regards 2015-01-29 15:14 GMT+01:00 Konstantin Gribov gros...@gmail.com: Hi, Gabriele. If you're using InputStream which doesn't support mark/reset tika facade (org.apache.Tika) creates BufferedInputStream which consumes up to 8k of original inputStream

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Tyler Palsulich
Thanks Konstantin and Gabriele! Please feel free to email any other questions or open an issue on the Tika JIRA. Have a good day! Tyler On Jan 29, 2015 11:43 AM, Gabriele Guidi gabriele.gu...@eng.it wrote: Ok, thank you for your support Best regards 2015-01-29 15:14 GMT+01:00 Konstantin

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Gabriele Guidi
Thanks for your answer. I had the same behaviour with tika 1.6 and 1.5. I found a workaround, the problem seems to happen with only InputStream, so now I use byte[] and it's OK. Thanks again Il 29/gen/2015 07:24 Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov ha scritto: Dear Gabriele,

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297194#comment-14297194 ] Lewis John McGibbney commented on TIKA-1423: [~gostep] can you please review

[jira] [Commented] (TIKA-1532) DIF Parser

2015-01-29 Thread Aakarsh Medleri Hire Math (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296616#comment-14296616 ] Aakarsh Medleri Hire Math commented on TIKA-1532: - Hi Nick, The MIME type

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Konstantin Gribov
Hi, Does this InputStream support mark/reset fuctionality? Is InputStream recreated before each subsequent call to tika.detect or it called on partially consumed stream (in case mark isn't supported)? -- Best regards, Konstantin Gribov Thu Jan 29 2015 at 9:25:28, Mattmann, Chris A (3980)

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Gabriele Guidi
Hi No, I ask it with *markSupported http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#markSupported()* () function and it says NO. No recreation. The code test is very simple: InputStream inputsbust = content.getContentStream(); System.out.println( mark and reset inputStream ?

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Konstantin Gribov
Hi, Gabriele. If you're using InputStream which doesn't support mark/reset tika facade (org.apache.Tika) creates BufferedInputStream which consumes up to 8k of original inputStream by default, so Tika mime type detector can't find pdf magic after first call. Second case (with copying to byte[])

Re: multiple detect call - different results (tika 1.7)

2015-01-29 Thread Konstantin Gribov
Hi, It's usual behavior for any function that consumes stream (it's not Tika related). In this case you can wrap it with BufferedInputStream, since detection needs only beginng of the file. Code snippet is: InputStream bis = new BufferedInputStream(inputsbust); tika.detect(bis); Tika resets

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297612#comment-14297612 ] Giuseppe Totaro commented on TIKA-1423: --- [~lewismc] your patch matches perfectly the

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297632#comment-14297632 ] Chris A. Mattmann commented on TIKA-1423: - +1 please commit. Thanks to all who

Rackspace VM and Standing up Tika Server

2015-01-29 Thread Lewis John Mcgibbney
Hi Tim, Can you please fill us in with the current status with the Tika + Rackspace effort. I have neglected this so apologies. I want to document what is available on the Tika wiki so we do not loose it again. I want to get a Tika Server running on Rackspace and we can redirect tika-vm.apache.org

[jira] [Commented] (TIKA-1518) Docker with Tika Server

2015-01-29 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297614#comment-14297614 ] Tyler Palsulich commented on TIKA-1518: --- 2. Sent a message. Andrew Bayer responded

[jira] [Resolved] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved TIKA-1423. Resolution: Fixed Committed @revision 1655873 in Tika trunk Good work team. Really

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Vineet Ghatge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297709#comment-14297709 ] Vineet Ghatge commented on TIKA-1423: - [~lewismc] I just tested it on my end as well.

tika-trunk-jdk1.6 - Build # 444 - Failure

2015-01-29 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.6 (build #444) Status: Failure Check console output at https://builds.apache.org/job/tika-trunk-jdk1.6/444/ to view the results.

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297822#comment-14297822 ] Hudson commented on TIKA-1423: -- FAILURE: Integrated in tika-trunk-jdk1.6 #444 (See

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297624#comment-14297624 ] Lewis John McGibbney commented on TIKA-1423: OK folks, I would like to commit

[jira] [Commented] (TIKA-1518) Docker with Tika Server

2015-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297623#comment-14297623 ] Chris A. Mattmann commented on TIKA-1518: - [~tpalsulich] talk to

RE: Rackspace VM and Standing up Tika Server

2015-01-29 Thread Allison, Timothy B.
Hi Lewis, et al., For TIKA-1302, the server is up and running but hasn’t “gone live”. I’m waiting on the lawyers before I turn anything on. I also want to fix the PDF permissions issue before publishing anything. More work remains on the summary statistics generation, but progress is

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297890#comment-14297890 ] Hudson commented on TIKA-1423: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #460 (See

TIKA-1423 Build a parser to extract data from GRIB formats not good with Java 6

2015-01-29 Thread Lewis John Mcgibbney
Hi Folks, Having committed TIKA-1423 it has become apparent to me that the libraries being pulled as dependencies are not compatible with JDK 1.6 as indicated with our Jenkins 1.6 build. Do we want to move towards dropping support for Java 1.6? Oracle made an announcement some time ago so this is

Re: Rackspace VM and Standing up Tika Server

2015-01-29 Thread Lewis John Mcgibbney
Hi Tim, On Thu, Jan 29, 2015 at 5:00 PM, Allison, Timothy B. talli...@mitre.org wrote: For TIKA-1302, the server is up and running but hasn’t “gone live”. I’m waiting on the lawyers before I turn anything on. I also want to fix the PDF permissions issue before publishing anything. More

[jira] [Updated] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-29 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1535: -- Description: The Class MIMETypes does not currently allow for inheritance. There are a couple of methods in

Re: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java 6

2015-01-29 Thread Mattmann, Chris A (3980)
+1 move to 1.7 Sent from my iPhone On Jan 29, 2015, at 5:04 PM, Allison, Timothy B. talli...@mitre.org wrote: +1 to dropping 1.6...let's move to 1.8 and beyond! :) -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Thursday, January 29, 2015

Re: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java 6

2015-01-29 Thread Tyler Palsulich
+1 Tyler On Jan 29, 2015 9:52 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: +1 move to 1.7 Sent from my iPhone On Jan 29, 2015, at 5:04 PM, Allison, Timothy B. talli...@mitre.org wrote: +1 to dropping 1.6...let's move to 1.8 and beyond! :) -Original

[jira] [Assigned] (TIKA-1301) Establish TikaServer on Apache hosted VM

2015-01-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned TIKA-1301: -- Assignee: Lewis John McGibbney Establish TikaServer on Apache hosted VM

[jira] [Created] (TIKA-1536) Upgrade compiler definition in pom's to Java 7

2015-01-29 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created TIKA-1536: -- Summary: Upgrade compiler definition in pom's to Java 7 Key: TIKA-1536 URL: https://issues.apache.org/jira/browse/TIKA-1536 Project: Tika Issue

[jira] [Updated] (TIKA-1536) Upgrade compiler definition in pom's to Java 7

2015-01-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1536: --- Attachment: TIKA-1536.patch Trivial patch for trunk. Test verified to pass. Upgrade

[jira] [Commented] (TIKA-1518) Docker with Tika Server

2015-01-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298002#comment-14298002 ] Tim Allison commented on TIKA-1518: --- [~tpalsulich], y, the server was initially intended

RE: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java 6

2015-01-29 Thread Allison, Timothy B.
+1 to dropping 1.6...let's move to 1.8 and beyond! :) -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Thursday, January 29, 2015 6:51 PM To: dev@tika.apache.org Subject: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java