Hi guys,
I was investigating on the use of the stream.type parameter that we can
pass to a Solr Connector as an argument.

Form the wiki : "Tika will automatically attempt to determine the input
document type (word, pdf, etc.) and extract the content appropriately. If
you want, you can explicitly specify a MIME type for Tika wth the
stream.type parameter" .

So I was expecting that if we express the stream.type, we check this type
before sending a Request to Solr.
In the way that we avoid to send Request for types that are not the wanted
one.

But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when we add
the content to the ContentStreamUpdateRequest we don't check the type at
all :

contentStreamUpdateRequest.addContentStream(new
RepositoryDocumentStream(is,length,contentType,contentName));

So, if we pass the parameter stream.type=text/plain, and we have one
content that is video/mp4 we expect to not send that ( maybe is 1 Gb long
and can cause problems) .

What do you think ? Should we put a control on the type before sending the
content ?
Am i missing something ?



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to