Hi Karl, I'm not referring to filter documents. I'm referring to filter metadata associated to a document ( which will be mapped in Solr fields by the Solr connector) . Because now in the job metadata mapping screen, you can select a sub set of metadata to be mapped in solr fields, but then all the metadata associated to the document are sent to Solr ( in the way I expressed in the e mail).
Cheers 2013/12/13 Karl Wright <[email protected]> > Hi Alessandro, > > I'm not entirely sure I understand your use case, but so far in ManifoldCF > nobody has requested that an output connector perform document filtering, > other than to reject documents by responding with "DOCUMENT_REJECTED". > Usually document filtering is part of the repository connector's > functionality, since filtering is most effective when it is described in > terms of the individual repository's constructs. At the repository > connector level, you can describe an appropriate set of documents to > include, rather than crawling everything and rejecting the ones you don't > want. This description is called the "Document Specification". When you > create and edit a job in the Crawler UI some of the job's tabs modify that > specification, and the repository connector code understands the > specification and limits the documents being crawled using it. > > On the output side, e.g. in the Solr output connector, it's already too > late to restrict which documents are crawled. The best you can do is just > to not send them to the index, or explicitly reject them. This makes the > utility of any feature to filter documents in an output connector of > limited utility, compared with doing the same thing in the Document > Specification. > > Hope this helps, > Karl > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti < > [email protected]> wrote: > > > Hi guys, > > I have one question for you. > > looking in the details of the SolrConnector it's possible to see that : > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > > > > writeField(out,LITERAL+newFieldName,values); > > // Write the commitWithin parameter > > if (commitWithin != null) > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > > contentStreamUpdateRequest.setParams(out); > > contentStreamUpdateRequest.addContentStream(new > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > In a Job using a Solr connector, it's possible to express the metadata > > mapping, mapping specific metadata to solr field names. > > But if you select only 3 mappings , what is happening is that all the > > metadata in the manifold document are sent as params of the > > contentStreamRequest and the mapping is used only to rename the fields we > > want to rename . > > > > In my opinion the mapping should be use as a filter as well. > > Because if the user select only 3 metadata, he wants to see only those > > metadata. > > probably should be present at least a flag that allow the user to filter > > the metadata sent to solr or not. > > A little change that can solve a lot of use cases when the user is > > interested only in a subset of metadata and does not need to send > > everithing in the header of the http POST. > > I'm pretty new to ManifoldCF so let me know if this feature is already > > there and I misunderstood something . > > > > > > Cheers > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
