I tried out your suggestions here on a freshly-installed Solr 3.1 instance. Some observations:
(1) The /extract/tika handler does not exist out of the box; the /update/extract handler still exists though. (2) For the /update/extract handler, it did not seem like you needed fmap.content=attr_content as an argument. So it looks like, for a simple setup, the Solr output connector's default values worked just fine. (I had a lot of trouble with Derby queries running a long time, but that's a different issue). Karl On Thu, Apr 21, 2011 at 10:30 AM, Kadri Atalay <atalay.ka...@gmail.com> wrote: > Sure Karl, no problem. > > My initial assumption was that; when Solr is Setup to use Tika (Solr Cell) , > content would be automatically extracted and indexed in Solr. > But it looks like, field mapping needed to be defined in the ManifoldCF job. > > The goal of the project I'm working on is to: > > 1-use Solr with Tika (to extract and index MULTIPLE formats of documents), > 2-use ManifoldCF (to use active directory security to pull user information > from a domain controller, store ACL for each indexed document), > 3-perform secure searches on all the indexed documents based on logged in > user credentials. > > One Caveat here is that, the file system I'm using is not a plain vanilla > FS. It's StorHouse / RFS from FileTek. > > So, as I move along, I'll post my findings, and ask for suggestions. > > I already got your book, and can't wait to read the connector creation > chapters ! > > Thanks, > > Kadri > > > On Thu, Apr 21, 2011 at 5:58 AM, Karl Wright <daddy...@gmail.com> wrote: >> >> Thanks for doing this. >> >> If you have suggestions as to how to modify the default behavior of >> the Solr output connector given the recent release of Solr 3.1, please >> consider creating a ticket in Apache JIRA that describes what you >> think needs to happen. The output connector was designed to work with >> the example configuration of Solr by default; I believe it would be >> good to retain that ability. >> >> Karl >> >> On Wed, Apr 20, 2011 at 6:49 PM, Kadri Atalay <atalay.ka...@gmail.com> >> wrote: >> > I added the following field mapping into Manifold Job and now it's >> > indexing >> > the document content also ! >> > >> > (fmap.content attr_content) >> > >> > Thanks ! >> > >> > >> > On Wed, Apr 20, 2011 at 6:36 PM, Karl Wright <daddy...@gmail.com> wrote: >> >> >> >> The content is posted to the update request handler. It might be >> >> helpful if you turn on some logging in Solr to see exactly what is >> >> happening there. >> >> >> >> Karl >> >> >> >> On Wed, Apr 20, 2011 at 6:18 PM, Kadri Atalay <atalay.ka...@gmail.com> >> >> wrote: >> >> > I'm able to use Manifold and SharedDrive connector to index files >> >> > into >> >> > Solr. >> >> > But, only information I see in the Solr is Author, Content_type,Name, >> >> > & >> >> > last_modified. >> >> > >> >> > Can anyone tell me, how to index also the content into Solr ? >> >> > >> >> > Thanks in Advance ! >> >> > >> >> > Kadri >> >> > >> >> > PS. I'm using SolrCell (Tika) and manual update/extract is working >> >> > fine. >> >> > >> > >> > > >