Feel free to create a ticket and attach a patch if you'd like an additional
feature here.

Karl



On Fri, Dec 13, 2013 at 10:33 AM, Alessandro Benedetti <
[email protected]> wrote:

> Ok, thank you , now I have clear the process.
> So if I have 100 metadata fields, and in each output Solr Connector I want
> 3 fields to be indexed I have to add 100 mappings, 3 with value and 97
> blank ?
> Now I understand how it's working, but it seems a little bit counter
> intuitive and long to configure, doesn't it ?
>
>
> 2013/12/13 Karl Wright <[email protected]>
>
> > If this is not working as detailed in the documentation, please do open a
> > ticket and we'll look at it.
> > Karl
> >
> >
> >
> > On Fri, Dec 13, 2013 at 10:13 AM, Karl Wright <[email protected]>
> wrote:
> >
> > > Hi Alessandro,
> > >
> > > The Solr metadata mapping field is described thoroughly in the end user
> > > documentation:
> > >
> > > "
> > >
> > > When you configure a job to use a Solr-type output connection, the Solr
> > > connection type provides a tab called "Field Mapping". The purpose of
> > this
> > > tab is to allow you to map metadata fields as fetched by the job's
> > > connection type to fields that Solr is set up to receive. This is
> > necessary
> > > because the names of the metadata items are often determined by the
> > > repository, with no alignment to fields defined in the Solr schema. You
> > may
> > > also suppress specific metadata items from being sent to the index
> using
> > > this tab. The tab looks like this:
> > >
> > >
> > >  [image: Solr Specification, Field Mapping tab]
> > >
> > >
> > > Add a new mapping by filling in the "source" with the name of the
> > metadata
> > > item from the repository, and "target" as the name of the output field
> in
> > > Solr, and click the "Add" button. Leaving the "target" field blank will
> > > result in all metadata items of that name not being sent to Solr."
> > >
> > > Karl
> > >
> > >
> > > On Fri, Dec 13, 2013 at 9:54 AM, Alessandro Benedetti <
> > > [email protected]> wrote:
> > >
> > >> But we were talking about the output connector right ?
> > >> Maybe I want the repository connector to extract those metadata
> fields,
> > >> and
> > >> those metadata will be used differently by different output
> connectors (
> > >> for example 2 different Jobs, with different Solr mappings).
> > >>
> > >> Sorry if I repeat the question but :
> > >> What is the meaning of the Solr field mapping in a ManifoldJob ( that
> > uses
> > >> a Solr Connector) ?
> > >> If the meaning is to index in Solr only those fields, so, there is
> that
> > >> little bug :)
> > >>
> > >>
> > >>
> > >> 2013/12/13 Karl Wright <[email protected]>
> > >>
> > >> > Hi Alessandro,
> > >> >
> > >> > Usually the repository connector also specifies what metadata to
> > >> include.
> > >> > What connector are you crawling with?
> > >> >
> > >> > Karl
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti <
> > >> > [email protected]> wrote:
> > >> >
> > >> > > Actually it can be a problem.
> > >> > > For example your Solr is running in an application server with a
> > >> limit on
> > >> > > the HttpRequestHeader.
> > >> > > So the server will refuse all the requests that exceeds that
> limit.
> > >> > >
> > >> > > We are interested in only 3 metadata but Manifold extract n (
> n>>3)
> > >> for
> > >> > > each document.
> > >> > > We can configure the mapping to map those 3 metadata.
> > >> > > But the Post request is built with all the metadata from the
> > document
> > >> ,
> > >> > it
> > >> > > exceeds the request header and the document will be Rejected
> without
> > >> > > reason.
> > >> > >
> > >> > > So if the meaning of the Solr field mapping in a Job with a Solr
> > >> > Connector
> > >> > > it's to index only those fields, so the current behaviour it's a
> > bug.
> > >> > > For the reason I explained before.
> > >> > >
> > >> > > Cheers
> > >> > >
> > >> > >
> > >> > > 2013/12/13 Karl Wright <[email protected]>
> > >> > >
> > >> > > > Hi Alessandro,
> > >> > > >
> > >> > > > Thank you for the clarification.
> > >> > > > If you believe it would be helpful to filter metadata, by all
> > means
> > >> > open
> > >> > > a
> > >> > > > ticket and attach a patch.  But I don't exactly see where there
> > >> would
> > >> > be
> > >> > > an
> > >> > > > issue, since metadata that is posted that is not in the Solr
> > schema
> > >> is
> > >> > > > simply going to be discarded.
> > >> > > >
> > >> > > > Karl
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti <
> > >> > > > [email protected]> wrote:
> > >> > > >
> > >> > > > > Hi Karl,
> > >> > > > > I'm not referring to filter documents.
> > >> > > > > I'm referring to filter metadata associated to a document (
> > which
> > >> > will
> > >> > > be
> > >> > > > > mapped in Solr fields by the Solr connector) .
> > >> > > > > Because now in the job metadata mapping screen, you can
> select a
> > >> sub
> > >> > > set
> > >> > > > of
> > >> > > > > metadata to be mapped in solr fields, but then all the
> metadata
> > >> > > > associated
> > >> > > > > to the document are sent to Solr ( in the way I expressed in
> > the e
> > >> > > mail).
> > >> > > > >
> > >> > > > > Cheers
> > >> > > > >
> > >> > > > >
> > >> > > > > 2013/12/13 Karl Wright <[email protected]>
> > >> > > > >
> > >> > > > > > Hi Alessandro,
> > >> > > > > >
> > >> > > > > > I'm not entirely sure I understand your use case, but so far
> > in
> > >> > > > > ManifoldCF
> > >> > > > > > nobody has requested that an output connector perform
> document
> > >> > > > filtering,
> > >> > > > > > other than to reject documents by responding with
> > >> > > "DOCUMENT_REJECTED".
> > >> > > > > > Usually document filtering is part of the repository
> > connector's
> > >> > > > > > functionality, since filtering is most effective when it is
> > >> > described
> > >> > > > in
> > >> > > > > > terms of the individual repository's constructs.  At the
> > >> repository
> > >> > > > > > connector level, you can describe an appropriate set of
> > >> documents
> > >> > to
> > >> > > > > > include, rather than crawling everything and rejecting the
> > ones
> > >> you
> > >> > > > don't
> > >> > > > > > want.  This description is called the "Document
> > Specification".
> > >> >  When
> > >> > > > you
> > >> > > > > > create and edit a job in the Crawler UI some of the job's
> tabs
> > >> > modify
> > >> > > > > that
> > >> > > > > > specification, and the repository connector code understands
> > the
> > >> > > > > > specification and limits the documents being crawled using
> it.
> > >> > > > > >
> > >> > > > > > On the output side, e.g. in the Solr output connector, it's
> > >> already
> > >> > > too
> > >> > > > > > late to restrict which documents are crawled.  The best you
> > can
> > >> do
> > >> > is
> > >> > > > > just
> > >> > > > > > to not send them to the index, or explicitly reject them.
> >  This
> > >> > makes
> > >> > > > the
> > >> > > > > > utility of any feature to filter documents in an output
> > >> connector
> > >> > of
> > >> > > > > > limited utility, compared with doing the same thing in the
> > >> Document
> > >> > > > > > Specification.
> > >> > > > > >
> > >> > > > > > Hope this helps,
> > >> > > > > > Karl
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
> > >> > > > > > [email protected]> wrote:
> > >> > > > > >
> > >> > > > > > > Hi guys,
> > >> > > > > > > I have one question for you.
> > >> > > > > > > looking in the details of the SolrConnector it's possible
> to
> > >> see
> > >> > > > that :
> > >> > > > > > >
> > >> > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > >> > > > > > >
> > >> > > > > > >  writeField(out,LITERAL+newFieldName,values);
> > >> > > > > > > // Write the commitWithin parameter
> > >> > > > > > >  if (commitWithin != null)
> > >> > > > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > >> > > > > > >      contentStreamUpdateRequest.setParams(out);
> > >> > > > > > >      contentStreamUpdateRequest.addContentStream(new
> > >> > > > > > >
> >  RepositoryDocumentStream(is,length,contentType,contentName));
> > >> > > > > > >
> > >> > > > > > > In a Job using a Solr connector, it's possible to express
> > the
> > >> > > > metadata
> > >> > > > > > > mapping, mapping specific metadata to solr field names.
> > >> > > > > > > But if you select only 3 mappings , what is happening is
> > that
> > >> all
> > >> > > the
> > >> > > > > > > metadata in the manifold document are sent as params of
> the
> > >> > > > > > > contentStreamRequest and the mapping is used only to
> rename
> > >> the
> > >> > > > fields
> > >> > > > > we
> > >> > > > > > > want to rename .
> > >> > > > > > >
> > >> > > > > > > In my opinion the mapping should be use as a filter as
> well.
> > >> > > > > > > Because if the user select only 3 metadata, he wants to
> see
> > >> only
> > >> > > > those
> > >> > > > > > > metadata.
> > >> > > > > > > probably should be present at least a flag that allow the
> > >> user to
> > >> > > > > filter
> > >> > > > > > > the metadata sent to solr or not.
> > >> > > > > > > A little change that can solve a lot of use cases when the
> > >> user
> > >> > is
> > >> > > > > > > interested only in a subset of metadata and does not need
> to
> > >> send
> > >> > > > > > > everithing in the header of the http POST.
> > >> > > > > > > I'm pretty new to ManifoldCF so let me know if this
> feature
> > is
> > >> > > > already
> > >> > > > > > > there and I misunderstood something .
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Cheers
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > --------------------------
> > >> > > > > > >
> > >> > > > > > > Benedetti Alessandro
> > >> > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > > > >
> > >> > > > > > > "Tyger, tyger burning bright
> > >> > > > > > > In the forests of the night,
> > >> > > > > > > What immortal hand or eye
> > >> > > > > > > Could frame thy fearful symmetry?"
> > >> > > > > > >
> > >> > > > > > > William Blake - Songs of Experience -1794 England
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > --------------------------
> > >> > > > >
> > >> > > > > Benedetti Alessandro
> > >> > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > >
> > >> > > > > "Tyger, tyger burning bright
> > >> > > > > In the forests of the night,
> > >> > > > > What immortal hand or eye
> > >> > > > > Could frame thy fearful symmetry?"
> > >> > > > >
> > >> > > > > William Blake - Songs of Experience -1794 England
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > --------------------------
> > >> > >
> > >> > > Benedetti Alessandro
> > >> > > Visiting card : http://about.me/alessandro_benedetti
> > >> > >
> > >> > > "Tyger, tyger burning bright
> > >> > > In the forests of the night,
> > >> > > What immortal hand or eye
> > >> > > Could frame thy fearful symmetry?"
> > >> > >
> > >> > > William Blake - Songs of Experience -1794 England
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> --------------------------
> > >>
> > >> Benedetti Alessandro
> > >> Visiting card : http://about.me/alessandro_benedetti
> > >>
> > >> "Tyger, tyger burning bright
> > >> In the forests of the night,
> > >> What immortal hand or eye
> > >> Could frame thy fearful symmetry?"
> > >>
> > >> William Blake - Songs of Experience -1794 England
> > >>
> > >
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to