[
https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875956#comment-13875956
]
Sergey Beryozkin edited comment on TIKA-1198 at 1/19/14 6:20 PM:
-----------------------------------------------------------------
Dave, I've missed your comment with the exception trace, sorry about it.
After seeing a comment from Jeremy I've tested the JAX-RS server and I can
confirm all works as expected.
Note, "curl -T somefile targetURI" does not set Content-Type which explains the
exception you are seeing. TikaServer has two resource methods accepting PUT
payloads on the same path, one - specifically the multipart/form-data ones and
another - all other types of payloads, and it uses a wildcard to match all
possible types. Thus a method with a more specific JAX-RS Consumes value
(multipart/form-data) is chosen when no Content-Type is available: the error
actually mentions an octet-stream - this is a default content type assigned to
an individual multipart/form-data.
Two fixes are possible:
1. Use -H curl parameter, for example, I've started a server (using a newly
added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type:
text/xml"' and all worked fine. So the actual 'fix' is to update the docs and
recommend to set up Content-Type when no multiparts are used.
2. Have a TikaServer resource method accepting multiparts listen on a unique
path, say on "http://localhost:9998/tika/form"
Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the
clients PUT-ing something into the server are expected to set Content-Type.
I'm fine with implementing Option 2 though too - perhaps it can be done anyway
but users should be encouraged to set content types anyway - this can optimize
the parsing, aka, avoid doing the detection at the parser level and optionally
use a Content-Type
So, will we add a "/form" to a multipart/form-data accepting resource method or
keep things as is ?
Cheers, Sergey
was (Author: sergey_beryozkin):
Dave, I've missed your comment with the exception trace, sorry about it.
After seeing a comment from Jeremy I've tested the JAX-RS server and I can
confirm all works as expected.
Note, "curl -T somefile targetURI" does not set Content-Type which explains the
exception you are seeing. TikaServer has two resource methods accepting PUT
payloads on the same path, one - specifically the multipart/form-data ones and
another - all other types of payloads, and it uses a wildcard to match all
possible types. Thus a method with a more specific JAX-RS Consumes value
(multipart/form-data) is chosen: the error actually mentions an octet-stream -
this is a default content type assigned to an individual multipart/form-data.
Two fixes are possible:
1. Use -H curl parameter, for example, I've started a server (using a newly
added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type:
text/xml"' and all worked fine. So the actual 'fix' is to update the docs and
recommend to set up Content-Type when no multiparts are used.
2. Have a TikaServer resource method accepting multiparts listen on a unique
path, say on "http://localhost:9998/tika/form"
Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the
clients PUT-ing something into the server are expected to set Content-Type.
I'm fine with implementing Option 2 though too - perhaps it can be done anyway
but users should be encouraged to set content types anyway - this can optimize
the parsing, aka, avoid doing the detection at the parser level and optionally
use a Content-Type
So, will we add a "/form" to a multipart/form-data accepting resource method or
keep things as is ?
Cheers, Sergey
> Consider optionally utilizing CXF JAX-RS Attachment support
> -----------------------------------------------------------
>
> Key: TIKA-1198
> URL: https://issues.apache.org/jira/browse/TIKA-1198
> Project: Tika
> Issue Type: Wish
> Components: server
> Reporter: Sergey Beryozkin
> Priority: Minor
>
> CXF offers a fairly extensive support for multiparts:
> http://cxf.apache.org/docs/jax-rs-multiparts.html
> Perhaps some of that can help with the server offering more options to do
> with uploading/downloading files
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)