If this is not working as detailed in the documentation, please do open a ticket and we'll look at it. Karl
On Fri, Dec 13, 2013 at 10:13 AM, Karl Wright <[email protected]> wrote: > Hi Alessandro, > > The Solr metadata mapping field is described thoroughly in the end user > documentation: > > " > > When you configure a job to use a Solr-type output connection, the Solr > connection type provides a tab called "Field Mapping". The purpose of this > tab is to allow you to map metadata fields as fetched by the job's > connection type to fields that Solr is set up to receive. This is necessary > because the names of the metadata items are often determined by the > repository, with no alignment to fields defined in the Solr schema. You may > also suppress specific metadata items from being sent to the index using > this tab. The tab looks like this: > > > [image: Solr Specification, Field Mapping tab] > > > Add a new mapping by filling in the "source" with the name of the metadata > item from the repository, and "target" as the name of the output field in > Solr, and click the "Add" button. Leaving the "target" field blank will > result in all metadata items of that name not being sent to Solr." > > Karl > > > On Fri, Dec 13, 2013 at 9:54 AM, Alessandro Benedetti < > [email protected]> wrote: > >> But we were talking about the output connector right ? >> Maybe I want the repository connector to extract those metadata fields, >> and >> those metadata will be used differently by different output connectors ( >> for example 2 different Jobs, with different Solr mappings). >> >> Sorry if I repeat the question but : >> What is the meaning of the Solr field mapping in a ManifoldJob ( that uses >> a Solr Connector) ? >> If the meaning is to index in Solr only those fields, so, there is that >> little bug :) >> >> >> >> 2013/12/13 Karl Wright <[email protected]> >> >> > Hi Alessandro, >> > >> > Usually the repository connector also specifies what metadata to >> include. >> > What connector are you crawling with? >> > >> > Karl >> > >> > >> > >> > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti < >> > [email protected]> wrote: >> > >> > > Actually it can be a problem. >> > > For example your Solr is running in an application server with a >> limit on >> > > the HttpRequestHeader. >> > > So the server will refuse all the requests that exceeds that limit. >> > > >> > > We are interested in only 3 metadata but Manifold extract n ( n>>3) >> for >> > > each document. >> > > We can configure the mapping to map those 3 metadata. >> > > But the Post request is built with all the metadata from the document >> , >> > it >> > > exceeds the request header and the document will be Rejected without >> > > reason. >> > > >> > > So if the meaning of the Solr field mapping in a Job with a Solr >> > Connector >> > > it's to index only those fields, so the current behaviour it's a bug. >> > > For the reason I explained before. >> > > >> > > Cheers >> > > >> > > >> > > 2013/12/13 Karl Wright <[email protected]> >> > > >> > > > Hi Alessandro, >> > > > >> > > > Thank you for the clarification. >> > > > If you believe it would be helpful to filter metadata, by all means >> > open >> > > a >> > > > ticket and attach a patch. But I don't exactly see where there >> would >> > be >> > > an >> > > > issue, since metadata that is posted that is not in the Solr schema >> is >> > > > simply going to be discarded. >> > > > >> > > > Karl >> > > > >> > > > >> > > > >> > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti < >> > > > [email protected]> wrote: >> > > > >> > > > > Hi Karl, >> > > > > I'm not referring to filter documents. >> > > > > I'm referring to filter metadata associated to a document ( which >> > will >> > > be >> > > > > mapped in Solr fields by the Solr connector) . >> > > > > Because now in the job metadata mapping screen, you can select a >> sub >> > > set >> > > > of >> > > > > metadata to be mapped in solr fields, but then all the metadata >> > > > associated >> > > > > to the document are sent to Solr ( in the way I expressed in the e >> > > mail). >> > > > > >> > > > > Cheers >> > > > > >> > > > > >> > > > > 2013/12/13 Karl Wright <[email protected]> >> > > > > >> > > > > > Hi Alessandro, >> > > > > > >> > > > > > I'm not entirely sure I understand your use case, but so far in >> > > > > ManifoldCF >> > > > > > nobody has requested that an output connector perform document >> > > > filtering, >> > > > > > other than to reject documents by responding with >> > > "DOCUMENT_REJECTED". >> > > > > > Usually document filtering is part of the repository connector's >> > > > > > functionality, since filtering is most effective when it is >> > described >> > > > in >> > > > > > terms of the individual repository's constructs. At the >> repository >> > > > > > connector level, you can describe an appropriate set of >> documents >> > to >> > > > > > include, rather than crawling everything and rejecting the ones >> you >> > > > don't >> > > > > > want. This description is called the "Document Specification". >> > When >> > > > you >> > > > > > create and edit a job in the Crawler UI some of the job's tabs >> > modify >> > > > > that >> > > > > > specification, and the repository connector code understands the >> > > > > > specification and limits the documents being crawled using it. >> > > > > > >> > > > > > On the output side, e.g. in the Solr output connector, it's >> already >> > > too >> > > > > > late to restrict which documents are crawled. The best you can >> do >> > is >> > > > > just >> > > > > > to not send them to the index, or explicitly reject them. This >> > makes >> > > > the >> > > > > > utility of any feature to filter documents in an output >> connector >> > of >> > > > > > limited utility, compared with doing the same thing in the >> Document >> > > > > > Specification. >> > > > > > >> > > > > > Hope this helps, >> > > > > > Karl >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti < >> > > > > > [email protected]> wrote: >> > > > > > >> > > > > > > Hi guys, >> > > > > > > I have one question for you. >> > > > > > > looking in the details of the SolrConnector it's possible to >> see >> > > > that : >> > > > > > > >> > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster >> > > > > > > >> > > > > > > writeField(out,LITERAL+newFieldName,values); >> > > > > > > // Write the commitWithin parameter >> > > > > > > if (commitWithin != null) >> > > > > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); >> > > > > > > contentStreamUpdateRequest.setParams(out); >> > > > > > > contentStreamUpdateRequest.addContentStream(new >> > > > > > > RepositoryDocumentStream(is,length,contentType,contentName)); >> > > > > > > >> > > > > > > In a Job using a Solr connector, it's possible to express the >> > > > metadata >> > > > > > > mapping, mapping specific metadata to solr field names. >> > > > > > > But if you select only 3 mappings , what is happening is that >> all >> > > the >> > > > > > > metadata in the manifold document are sent as params of the >> > > > > > > contentStreamRequest and the mapping is used only to rename >> the >> > > > fields >> > > > > we >> > > > > > > want to rename . >> > > > > > > >> > > > > > > In my opinion the mapping should be use as a filter as well. >> > > > > > > Because if the user select only 3 metadata, he wants to see >> only >> > > > those >> > > > > > > metadata. >> > > > > > > probably should be present at least a flag that allow the >> user to >> > > > > filter >> > > > > > > the metadata sent to solr or not. >> > > > > > > A little change that can solve a lot of use cases when the >> user >> > is >> > > > > > > interested only in a subset of metadata and does not need to >> send >> > > > > > > everithing in the header of the http POST. >> > > > > > > I'm pretty new to ManifoldCF so let me know if this feature is >> > > > already >> > > > > > > there and I misunderstood something . >> > > > > > > >> > > > > > > >> > > > > > > Cheers >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > -------------------------- >> > > > > > > >> > > > > > > Benedetti Alessandro >> > > > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > > > >> > > > > > > "Tyger, tyger burning bright >> > > > > > > In the forests of the night, >> > > > > > > What immortal hand or eye >> > > > > > > Could frame thy fearful symmetry?" >> > > > > > > >> > > > > > > William Blake - Songs of Experience -1794 England >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > -------------------------- >> > > > > >> > > > > Benedetti Alessandro >> > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > >> > > > > "Tyger, tyger burning bright >> > > > > In the forests of the night, >> > > > > What immortal hand or eye >> > > > > Could frame thy fearful symmetry?" >> > > > > >> > > > > William Blake - Songs of Experience -1794 England >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > -------------------------- >> > > >> > > Benedetti Alessandro >> > > Visiting card : http://about.me/alessandro_benedetti >> > > >> > > "Tyger, tyger burning bright >> > > In the forests of the night, >> > > What immortal hand or eye >> > > Could frame thy fearful symmetry?" >> > > >> > > William Blake - Songs of Experience -1794 England >> > > >> > >> >> >> >> -- >> -------------------------- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > >
