Greets,

Dag Lem proposes to add delete_by_doc_id() to Indexer.  Since this means
a change to the public API, I'm forwarding the message to the Lucy dev list.

Deleting by doc id is something of an expert feature.  I'm not clear on what
the specific use case is and there may be workarounds, but I can certainly
imagine how it would come up from time to time.  I doubt that the underlying
implementation is likely to need changing any time soon, so I don't think this
addition limits our flexibility much.

Indexer is a high-profile public class and we would like to keep its API
small, so another approach would be to expose the DeletionsWriter
subcomponent.  However, IMO document deletion is central enough to Indexer's
purpose to justify top-level convenience methods.

+1 from me for adding delete_by_doc_id() to Indexer.

Marvin Humphrey

---------- Forwarded message ----------
From: Dag Lem <[email protected]>
Date: Wed, Jan 16, 2013 at 7:14 AM
Subject: [lucy-user] Add delete_by_doc_id to Lucy::Index::Indexer
To: [email protected]


Hi,

While attempting to modify an index I found that I missed a function
to delete a document by it's ID through Lucy::Index::Indexer (only
delete_by_term and delete_by_query are available).

Please find attached a patch - is this OK for inclusion?

--
Best regards,

Dag Lem
diff -ru Lucy-0.3.2/core/Lucy/Index/Indexer.c Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.c
--- Lucy-0.3.2/core/Lucy/Index/Indexer.c	2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.c	2013-01-14 15:46:48.586827000 +0100
@@ -303,6 +303,11 @@
 }
 
 void
+Indexer_delete_by_doc_id(Indexer *self, int32_t doc_id) {
+    DelWriter_Delete_By_Doc_ID(self->del_writer, doc_id);
+}
+
+void
 Indexer_add_index(Indexer *self, Obj *index) {
     Folder *other_folder = NULL;
     IndexReader *reader  = NULL;
diff -ru Lucy-0.3.2/core/Lucy/Index/Indexer.cfh Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.cfh
--- Lucy-0.3.2/core/Lucy/Index/Indexer.cfh	2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.cfh	2013-01-16 09:16:02.654919038 +0100
@@ -112,6 +112,13 @@
     public void
     Delete_By_Query(Indexer *self, Query *query);
 
+    /** Mark the document identified by the supplied document ID as deleted.
+     *
+     * @param doc_id A L<Document|Lucy::Document::Doc> ID.
+     */
+    public void
+    Delete_By_Doc_ID(Indexer *self, int32_t doc_id);
+
     /** Optimize the index for search-time performance.  This may take a
      * while, as it can involve rewriting large amounts of data.
      */
diff -ru Lucy-0.3.2/lib/Lucy/Index/Indexer.pm Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pm
--- Lucy-0.3.2/lib/Lucy/Index/Indexer.pm	2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pm	2013-01-15 16:29:56.270415005 +0100
@@ -184,6 +184,7 @@
         qw(
             Delete_By_Term
             Delete_By_Query
+            Delete_By_Doc_ID
             Add_Index
             Commit
             Prepare_Commit
@@ -201,6 +202,7 @@
                 prepare_commit
                 delete_by_term
                 delete_by_query
+                delete_by_doc_id
                 )
         ],
         synopsis     => $synopsis,
diff -ru Lucy-0.3.2/lib/Lucy/Index/Indexer.pod Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pod
--- Lucy-0.3.2/lib/Lucy/Index/Indexer.pod	2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pod	2013-01-16 09:26:34.497046861 +0100
@@ -192,6 +192,18 @@
 
 =back
 
+=head2 delete_by_doc_id(doc_id)
+
+Mark the document identified by the supplied document ID as deleted.
+
+=over
+
+=item *
+
+B<doc_id> - A L<Document|Lucy::Document::Doc> ID.
+
+=back
+
 
 
 =head1 INHERITANCE

Reply via email to