Hi guys, I was investigating on the use of the stream.type parameter that we can pass to a Solr Connector as an argument.
Form the wiki : "Tika will automatically attempt to determine the input document type (word, pdf, etc.) and extract the content appropriately. If you want, you can explicitly specify a MIME type for Tika wth the stream.type parameter" . So I was expecting that if we express the stream.type, we check this type before sending a Request to Solr. In the way that we avoid to send Request for types that are not the wanted one. But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when we add the content to the ContentStreamUpdateRequest we don't check the type at all : contentStreamUpdateRequest.addContentStream(new RepositoryDocumentStream(is,length,contentType,contentName)); So, if we pass the parameter stream.type=text/plain, and we have one content that is video/mp4 we expect to not send that ( maybe is 1 Gb long and can cause problems) . What do you think ? Should we put a control on the type before sending the content ? Am i missing something ? -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
