Hi Alessandro,

The scenario you describe would require four different jobs, since there is
only one output connection per job.  The repository connection asks the
output connection what mime type it accepts on a per-job basis, so it all
works just fine right now.  If your output connection does not accept mp4,
then just don't include it in the configuration for that output connection,
and all is well.

Karl



On Mon, Dec 16, 2013 at 2:33 PM, Alessandro Benedetti <
[email protected]> wrote:

> Mmm but if I express a stream type in the output connector I am expecting
> that it has some influence before sending the doc ... Think to this use
> case,  I have a repository connector that extract all the mime types.  Then
> I have 4 different output connectors,  one for each mimetype.  In that case
> I want to specify for each one the stream type and I want only that type.
> Without loosing time to send to the tezt/plain connector uselle mp4.
>
> Do you agree?
> Il 16/dic/2013 18:02 "Karl Wright" <[email protected]> ha scritto:
>
> > "So I was expecting that if we express the stream.type, we check this
> type
> > before sending a Request to Solr."
> >
> > Actually, the desired mime types selected by the output connection are
> > queried by the repository connection, so that document filtering can take
> > place before the document is even fetched.  See
> > IOutputConnector.checkMimeTypeIndexable .
> >
> > Karl
> >
> >
> >
> > On Mon, Dec 16, 2013 at 12:11 PM, Alessandro Benedetti <
> > [email protected]> wrote:
> >
> > > Hi guys,
> > > I was investigating on the use of the stream.type parameter that we can
> > > pass to a Solr Connector as an argument.
> > >
> > > Form the wiki : "Tika will automatically attempt to determine the input
> > > document type (word, pdf, etc.) and extract the content appropriately.
> If
> > > you want, you can explicitly specify a MIME type for Tika wth the
> > > stream.type parameter" .
> > >
> > > So I was expecting that if we express the stream.type, we check this
> type
> > > before sending a Request to Solr.
> > > In the way that we avoid to send Request for types that are not the
> > wanted
> > > one.
> > >
> > > But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when we
> > add
> > > the content to the ContentStreamUpdateRequest we don't check the type
> at
> > > all :
> > >
> > > contentStreamUpdateRequest.addContentStream(new
> > > RepositoryDocumentStream(is,length,contentType,contentName));
> > >
> > > So, if we pass the parameter stream.type=text/plain, and we have one
> > > content that is video/mp4 we expect to not send that ( maybe is 1 Gb
> long
> > > and can cause problems) .
> > >
> > > What do you think ? Should we put a control on the type before sending
> > the
> > > content ?
> > > Am i missing something ?
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>

Reply via email to