Hi Martin,

I quickly put together a patch for the pdf text filter. completely untested because I'm a bit short of time at the moment.

Any feedback if it works is appreciated.

regards
 marcel

Martin Perez wrote:
If you want to add a PDF document to a repository using a PdfTextFilter, and
you do the following steps:

session.save()
node.checkin();

The method PdfTextFilter.doFilter() gets called 4 times!!!

session's save method calls doFilter one time. This is normal

But checkin method calls doFilter three times. Is this normal? I do not see
the sense.

Thanks.

Martin

Index: java/org/apache/jackrabbit/core/query/LazyReader.java
===================================================================
--- java/org/apache/jackrabbit/core/query/LazyReader.java       (revision 0)
+++ java/org/apache/jackrabbit/core/query/LazyReader.java       (revision 0)
@@ -0,0 +1,66 @@
+/*
+ * Copyright 2004-2005 The Apache Software Foundation or its licensors,
+ *                     as applicable.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.jackrabbit.core.query;
+
+import java.io.Reader;
+import java.io.IOException;
+
+/**
+ * <code>LazyReader</code> implement an utility that allows an implementing
+ * class to lazy initialize an actual reader.
+ */
+public abstract class LazyReader extends Reader {
+
+    /**
+     * The actual reader, set by concrete sub class.
+     */
+    protected Reader delegate;
+
+    /**
+     * Implementation must set the actual reader [EMAIL PROTECTED] #delegate} 
when
+     * this method is called.
+     *
+     * @throws IOException if an error occurs.
+     */
+    protected abstract void initializeReader() throws IOException;
+
+    /**
+     * Closes the underlying reader.
+     *
+     * @throws IOException if an exception occurs while closing the underlying
+     *                     reader.
+     */
+    public void close() throws IOException {
+        if (delegate != null) {
+            delegate.close();
+        }
+    }
+
+    /**
+     * @inheritDoc
+     */
+    public int read(char cbuf[], int off, int len) throws IOException {
+        if (delegate == null) {
+            initializeReader();
+        }
+        // be suspicious
+        if (delegate == null) {
+            throw new IOException("reader not initialized");
+        }
+        return delegate.read(cbuf, off, len);
+    }
+}

Property changes on: java/org/apache/jackrabbit/core/query/LazyReader.java
___________________________________________________________________
Name: svn:eol-style
   + native

Index: java/org/apache/jackrabbit/core/query/PdfTextFilter.java
===================================================================
--- java/org/apache/jackrabbit/core/query/PdfTextFilter.java    (revision 
329171)
+++ java/org/apache/jackrabbit/core/query/PdfTextFilter.java    (working copy)
@@ -57,31 +57,37 @@
     public Map doFilter(PropertyState data, String encoding) throws 
RepositoryException {
         InternalValue[] values = data.getValues();
         if (values.length > 0) {
-            BLOBFileValue blob = (BLOBFileValue) values[0].internalValue();
-                
-            try {
-                PDFParser parser = new PDFParser(blob.getStream());
-                parser.parse();
-    
-                PDDocument document = parser.getPDDocument();
-    
-                CharArrayWriter writer = new CharArrayWriter();
-    
-                PDFTextStripper stripper = new PDFTextStripper();
-                stripper.setLineSeparator("\n");
-                stripper.writeText(document, writer);
-    
-                document.close();
-                writer.close();
-                
-                Map result = new HashMap();
-                result.put(FieldNames.FULLTEXT, new 
CharArrayReader(writer.toCharArray()));
-                return result;
-            } 
-            catch (IOException ex) {
-                throw new RepositoryException(ex);
-            }
-        } 
+            final BLOBFileValue blob = (BLOBFileValue) 
values[0].internalValue();
+            LazyReader reader = new LazyReader() {
+                protected void initializeReader() throws IOException {
+                    PDFParser parser;
+                    try {
+                        parser = new PDFParser(blob.getStream());
+                    } catch (RepositoryException e) {
+                        throw new IOException(e.getMessage());
+                    }
+                    parser.parse();
+
+                    PDDocument document = parser.getPDDocument();
+
+                    CharArrayWriter writer = new CharArrayWriter();
+
+                    PDFTextStripper stripper = new PDFTextStripper();
+                    stripper.setLineSeparator("\n");
+                    stripper.writeText(document, writer);
+
+                    document.close();
+                    writer.close();
+
+                    delegate = new CharArrayReader(writer.toCharArray());
+                }
+            };
+
+
+            Map result = new HashMap();
+            result.put(FieldNames.FULLTEXT, reader);
+            return result;
+        }
         else {
             // multi value not supported
             throw new RepositoryException("Multi-valued binary properties not 
supported.");

Reply via email to