[ 
https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875956#comment-13875956
 ] 

Sergey Beryozkin edited comment on TIKA-1198 at 1/19/14 6:20 PM:
-----------------------------------------------------------------

Dave, I've missed your comment with the exception trace, sorry about it.

After seeing a comment from Jeremy I've tested the JAX-RS server and I can 
confirm all works as expected.

Note, "curl -T somefile targetURI" does not set Content-Type which explains the 
exception you are seeing. TikaServer has two resource methods accepting PUT 
payloads on the same path, one - specifically the multipart/form-data ones and 
another - all other types of payloads, and it uses a wildcard to match all 
possible types.  Thus a method with a more specific JAX-RS Consumes value 
(multipart/form-data) is chosen when no Content-Type is available: the error 
actually mentions an octet-stream - this is a default content type assigned to 
an individual  multipart/form-data.

Two fixes are possible:

1. Use -H curl parameter, for example, I've started a server (using a newly 
added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: 
text/xml"' and all worked fine. So the actual 'fix' is to update the docs and 
recommend to set up Content-Type when no multiparts are used.

2. Have a TikaServer resource method accepting multiparts listen on a unique 
path, say on "http://localhost:9998/tika/form";

Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the 
clients PUT-ing something into the server are expected to set Content-Type.

I'm fine with implementing Option 2 though too - perhaps it can be done anyway 
but users should be encouraged to set content types anyway - this can optimize 
the parsing, aka, avoid doing the detection at the parser level and optionally 
use a Content-Type  

So, will we add a "/form" to a multipart/form-data accepting resource method or 
keep things as is ?

Cheers, Sergey
 


was (Author: sergey_beryozkin):
Dave, I've missed your comment with the exception trace, sorry about it.

After seeing a comment from Jeremy I've tested the JAX-RS server and I can 
confirm all works as expected.

Note, "curl -T somefile targetURI" does not set Content-Type which explains the 
exception you are seeing. TikaServer has two resource methods accepting PUT 
payloads on the same path, one - specifically the multipart/form-data ones and 
another - all other types of payloads, and it uses a wildcard to match all 
possible types.  Thus a method with a more specific JAX-RS Consumes value 
(multipart/form-data) is chosen: the error actually mentions an octet-stream - 
this is a default content type assigned to an individual  multipart/form-data.

Two fixes are possible:

1. Use -H curl parameter, for example, I've started a server (using a newly 
added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: 
text/xml"' and all worked fine. So the actual 'fix' is to update the docs and 
recommend to set up Content-Type when no multiparts are used.

2. Have a TikaServer resource method accepting multiparts listen on a unique 
path, say on "http://localhost:9998/tika/form";

Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the 
clients PUT-ing something into the server are expected to set Content-Type.

I'm fine with implementing Option 2 though too - perhaps it can be done anyway 
but users should be encouraged to set content types anyway - this can optimize 
the parsing, aka, avoid doing the detection at the parser level and optionally 
use a Content-Type  

So, will we add a "/form" to a multipart/form-data accepting resource method or 
keep things as is ?

Cheers, Sergey
 

> Consider optionally utilizing CXF JAX-RS Attachment support
> -----------------------------------------------------------
>
>                 Key: TIKA-1198
>                 URL: https://issues.apache.org/jira/browse/TIKA-1198
>             Project: Tika
>          Issue Type: Wish
>          Components: server
>            Reporter: Sergey Beryozkin
>            Priority: Minor
>
> CXF offers a fairly extensive support for multiparts:
> http://cxf.apache.org/docs/jax-rs-multiparts.html
> Perhaps some of that can help with the server offering more options to do 
> with uploading/downloading files



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to