Ok Karl, tomorrow I will Test the scenario proposed. Cheers Il 16/dic/2013 20:38 "Karl Wright" <[email protected]> ha scritto:
> Hi Alessandro, > > The scenario you describe would require four different jobs, since there is > only one output connection per job. The repository connection asks the > output connection what mime type it accepts on a per-job basis, so it all > works just fine right now. If your output connection does not accept mp4, > then just don't include it in the configuration for that output connection, > and all is well. > > Karl > > > > On Mon, Dec 16, 2013 at 2:33 PM, Alessandro Benedetti < > [email protected]> wrote: > > > Mmm but if I express a stream type in the output connector I am expecting > > that it has some influence before sending the doc ... Think to this use > > case, I have a repository connector that extract all the mime types. > Then > > I have 4 different output connectors, one for each mimetype. In that > case > > I want to specify for each one the stream type and I want only that type. > > Without loosing time to send to the tezt/plain connector uselle mp4. > > > > Do you agree? > > Il 16/dic/2013 18:02 "Karl Wright" <[email protected]> ha scritto: > > > > > "So I was expecting that if we express the stream.type, we check this > > type > > > before sending a Request to Solr." > > > > > > Actually, the desired mime types selected by the output connection are > > > queried by the repository connection, so that document filtering can > take > > > place before the document is even fetched. See > > > IOutputConnector.checkMimeTypeIndexable . > > > > > > Karl > > > > > > > > > > > > On Mon, Dec 16, 2013 at 12:11 PM, Alessandro Benedetti < > > > [email protected]> wrote: > > > > > > > Hi guys, > > > > I was investigating on the use of the stream.type parameter that we > can > > > > pass to a Solr Connector as an argument. > > > > > > > > Form the wiki : "Tika will automatically attempt to determine the > input > > > > document type (word, pdf, etc.) and extract the content > appropriately. > > If > > > > you want, you can explicitly specify a MIME type for Tika wth the > > > > stream.type parameter" . > > > > > > > > So I was expecting that if we express the stream.type, we check this > > type > > > > before sending a Request to Solr. > > > > In the way that we avoid to send Request for types that are not the > > > wanted > > > > one. > > > > > > > > But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when > we > > > add > > > > the content to the ContentStreamUpdateRequest we don't check the type > > at > > > > all : > > > > > > > > contentStreamUpdateRequest.addContentStream(new > > > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > > > > > So, if we pass the parameter stream.type=text/plain, and we have one > > > > content that is video/mp4 we expect to not send that ( maybe is 1 Gb > > long > > > > and can cause problems) . > > > > > > > > What do you think ? Should we put a control on the type before > sending > > > the > > > > content ? > > > > Am i missing something ? > > > > > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > >
