Installing the patch requires downloading the latest solr via subversion and applying the patch to the source. Eric has updated his patch with various revisions of subversion. To make sure it will compile I suggest getting the revision he lists.
As for using the features of this patch. This is the url that would be called /solr/update/rich?stream.file=filename&stream.type=filetype&id=id&stream.fieldname=storagefield&fieldnames=cat,desc,type,name&type=filetype&cat=category&name=name&desc=description Breaking this down You have stream.file which will be the absolute path to the file you want to index. You then have stream.type which specifies the type of file, which currently supports pdf, xls, doc, ppt. The next field is the id, which is where you specify the unique value for the id in your schema. Example is we had a document reference in a database, and that id was 103, so we would specify the value 103 to identify which document it was in the index. Stream.fieldname is the name of the field in your index that will actually be storing the text from the document. We had the field 'data' so it would be stream.fieldname=data in the url. The parameter fieldnames is any additional fields in your index that need to be filled. We were passing a category, description for the document, a name, and the type. So you just need to specify the names of the fields. Solr will then look for corresponding parameters with those names, which you can see at the end of my URL. The values passed for the additional parameters need to be sent url encoded. I'm not a Java programmer so if you have questions about the internals of the code, definitely direct those to Eric as I cannot help. I have only implemented it in web applications. If you have any other questions about the use of the patch I can answer those questions. Enjoy! - Pete On 8/21/07, Vish D. <[EMAIL PROTECTED]> wrote: > There seems to be some code out for Tika now (not packaged/announced yet, > but...). Could someone please take a look at it and see if that could fit > in? I am eagerly waiting for a reply back from tika-dev, but no luck yet. > > http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apache/tika/ > > I see that Eric's patch uses POI (for most of it)...so that's great! I have > seen too many duplicated efforts, even in Apache projects alone, and this is > one step close to fixing it (other than Tika, which isnt' 'complete' yet). > Are there any plans on releasing this patch with Solr dist? Or, any > instructions on using/installing the patch itself? > > Thanks > Vish > > > On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > > > Christian, > > > > Eric Pugh created implemented this functionality for a project we were > > doing and has released to code on JIRA. We have had very good results > > with it. If I can be of any help using it beyond the Java code itself > > let me know. The last revision I used with it was 552853, so if the > > build happens to fail you can roll back to that and it will work. > > > > https://issues.apache.org/jira/browse/SOLR-284 > > > > - Pete > > > > On 8/21/07, Christian Klinger <[EMAIL PROTECTED]> wrote: > > > Hi Solr Users, > > > > > > i have set up a Solr-Server with a custom Schema. > > > Now i have updated the index with some content form > > > xml-files. > > > > > > Now i try to update the contents of a folder. > > > The folder consits of various document-types > > > (pdf,doc,xls,...). > > > > > > Is there anywhere an howto how can i parse the > > > documents, make an xml of the paresed content > > > and post it to the solr server? > > > > > > Thanks in advance. > > > > > > Christian > > > > > > > > >