Re: [Solr Connector] Stream Type Check

Alessandro Benedetti Tue, 17 Dec 2013 04:02:37 -0800

Ok Karl, tomorrow I will Test the scenario proposed.

Cheers
Il 16/dic/2013 20:38 "Karl Wright" <[email protected]> ha scritto:


> Hi Alessandro,
>
> The scenario you describe would require four different jobs, since there is
> only one output connection per job.  The repository connection asks the
> output connection what mime type it accepts on a per-job basis, so it all
> works just fine right now.  If your output connection does not accept mp4,
> then just don't include it in the configuration for that output connection,
> and all is well.
>
> Karl
>
>
>
> On Mon, Dec 16, 2013 at 2:33 PM, Alessandro Benedetti <
> [email protected]> wrote:
>
> > Mmm but if I express a stream type in the output connector I am expecting
> > that it has some influence before sending the doc ... Think to this use
> > case,  I have a repository connector that extract all the mime types.
>  Then
> > I have 4 different output connectors,  one for each mimetype.  In that
> case
> > I want to specify for each one the stream type and I want only that type.
> > Without loosing time to send to the tezt/plain connector uselle mp4.
> >
> > Do you agree?
> > Il 16/dic/2013 18:02 "Karl Wright" <[email protected]> ha scritto:
> >
> > > "So I was expecting that if we express the stream.type, we check this
> > type
> > > before sending a Request to Solr."
> > >
> > > Actually, the desired mime types selected by the output connection are
> > > queried by the repository connection, so that document filtering can
> take
> > > place before the document is even fetched.  See
> > > IOutputConnector.checkMimeTypeIndexable .
> > >
> > > Karl
> > >
> > >
> > >
> > > On Mon, Dec 16, 2013 at 12:11 PM, Alessandro Benedetti <
> > > [email protected]> wrote:
> > >
> > > > Hi guys,
> > > > I was investigating on the use of the stream.type parameter that we
> can
> > > > pass to a Solr Connector as an argument.
> > > >
> > > > Form the wiki : "Tika will automatically attempt to determine the
> input
> > > > document type (word, pdf, etc.) and extract the content
> appropriately.
> > If
> > > > you want, you can explicitly specify a MIME type for Tika wth the
> > > > stream.type parameter" .
> > > >
> > > > So I was expecting that if we express the stream.type, we check this
> > type
> > > > before sending a Request to Solr.
> > > > In the way that we avoid to send Request for types that are not the
> > > wanted
> > > > one.
> > > >
> > > > But in the org.apache.manifoldcf.agents.output.solr.HttpPoster when
> we
> > > add
> > > > the content to the ContentStreamUpdateRequest we don't check the type
> > at
> > > > all :
> > > >
> > > > contentStreamUpdateRequest.addContentStream(new
> > > > RepositoryDocumentStream(is,length,contentType,contentName));
> > > >
> > > > So, if we pass the parameter stream.type=text/plain, and we have one
> > > > content that is video/mp4 we expect to not send that ( maybe is 1 Gb
> > long
> > > > and can cause problems) .
> > > >
> > > > What do you think ? Should we put a control on the type before
> sending
> > > the
> > > > content ?
> > > > Am i missing something ?
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
>

Re: [Solr Connector] Stream Type Check

Reply via email to