Mmm but if I express a stream type in the output connector I am expecting that it has some influence before sending the doc ... Think to this use case, I have a repository connector that extract all the mime types. Then I have 4 different output connectors, one for each mimetype. In that case I want to specify for each one the stream type and I want only that type. Without loosing time to send to the tezt/plain connector uselle mp4.
Do you agree? Il 16/dic/2013 18:02 "Karl Wright" <[email protected]> ha scritto: > "So I was expecting that if we express the stream.type, we check this type > before sending a Request to Solr." > > Actually, the desired mime types selected by the output connection are > queried by the repository connection, so that document filtering can take > place before the document is even fetched. See > IOutputConnector.checkMimeTypeIndexable . > > Karl > > > > On Mon, Dec 16, 2013 at 12:11 PM, Alessandro Benedetti < > [email protected]> wrote: > > > Hi guys, > > I was investigating on the use of the stream.type parameter that we can > > pass to a Solr Connector as an argument. > > > > Form the wiki : "Tika will automatically attempt to determine the input > > document type (word, pdf, etc.) and extract the content appropriately. If > > you want, you can explicitly specify a MIME type for Tika wth the > > stream.type parameter" . > > > > So I was expecting that if we express the stream.type, we check this type > > before sending a Request to Solr. > > In the way that we avoid to send Request for types that are not the > wanted > > one. > > > > But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when we > add > > the content to the ContentStreamUpdateRequest we don't check the type at > > all : > > > > contentStreamUpdateRequest.addContentStream(new > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > So, if we pass the parameter stream.type=text/plain, and we have one > > content that is video/mp4 we expect to not send that ( maybe is 1 Gb long > > and can cause problems) . > > > > What do you think ? Should we put a control on the type before sending > the > > content ? > > Am i missing something ? > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > >
