Greets,
Dag Lem proposes to add delete_by_doc_id() to Indexer. Since this means
a change to the public API, I'm forwarding the message to the Lucy dev list.
Deleting by doc id is something of an expert feature. I'm not clear on what
the specific use case is and there may be workarounds, but I can certainly
imagine how it would come up from time to time. I doubt that the underlying
implementation is likely to need changing any time soon, so I don't think this
addition limits our flexibility much.
Indexer is a high-profile public class and we would like to keep its API
small, so another approach would be to expose the DeletionsWriter
subcomponent. However, IMO document deletion is central enough to Indexer's
purpose to justify top-level convenience methods.
+1 from me for adding delete_by_doc_id() to Indexer.
Marvin Humphrey
---------- Forwarded message ----------
From: Dag Lem <[email protected]>
Date: Wed, Jan 16, 2013 at 7:14 AM
Subject: [lucy-user] Add delete_by_doc_id to Lucy::Index::Indexer
To: [email protected]
Hi,
While attempting to modify an index I found that I missed a function
to delete a document by it's ID through Lucy::Index::Indexer (only
delete_by_term and delete_by_query are available).
Please find attached a patch - is this OK for inclusion?
--
Best regards,
Dag Lem
diff -ru Lucy-0.3.2/core/Lucy/Index/Indexer.c Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.c
--- Lucy-0.3.2/core/Lucy/Index/Indexer.c 2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.c 2013-01-14 15:46:48.586827000 +0100
@@ -303,6 +303,11 @@
}
void
+Indexer_delete_by_doc_id(Indexer *self, int32_t doc_id) {
+ DelWriter_Delete_By_Doc_ID(self->del_writer, doc_id);
+}
+
+void
Indexer_add_index(Indexer *self, Obj *index) {
Folder *other_folder = NULL;
IndexReader *reader = NULL;
diff -ru Lucy-0.3.2/core/Lucy/Index/Indexer.cfh Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.cfh
--- Lucy-0.3.2/core/Lucy/Index/Indexer.cfh 2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/core/Lucy/Index/Indexer.cfh 2013-01-16 09:16:02.654919038 +0100
@@ -112,6 +112,13 @@
public void
Delete_By_Query(Indexer *self, Query *query);
+ /** Mark the document identified by the supplied document ID as deleted.
+ *
+ * @param doc_id A L<Document|Lucy::Document::Doc> ID.
+ */
+ public void
+ Delete_By_Doc_ID(Indexer *self, int32_t doc_id);
+
/** Optimize the index for search-time performance. This may take a
* while, as it can involve rewriting large amounts of data.
*/
diff -ru Lucy-0.3.2/lib/Lucy/Index/Indexer.pm Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pm
--- Lucy-0.3.2/lib/Lucy/Index/Indexer.pm 2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pm 2013-01-15 16:29:56.270415005 +0100
@@ -184,6 +184,7 @@
qw(
Delete_By_Term
Delete_By_Query
+ Delete_By_Doc_ID
Add_Index
Commit
Prepare_Commit
@@ -201,6 +202,7 @@
prepare_commit
delete_by_term
delete_by_query
+ delete_by_doc_id
)
],
synopsis => $synopsis,
diff -ru Lucy-0.3.2/lib/Lucy/Index/Indexer.pod Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pod
--- Lucy-0.3.2/lib/Lucy/Index/Indexer.pod 2012-07-10 16:06:40.000000000 +0200
+++ Lucy-0.3.2-delete_by_doc_id/lib/Lucy/Index/Indexer.pod 2013-01-16 09:26:34.497046861 +0100
@@ -192,6 +192,18 @@
=back
+=head2 delete_by_doc_id(doc_id)
+
+Mark the document identified by the supplied document ID as deleted.
+
+=over
+
+=item *
+
+B<doc_id> - A L<Document|Lucy::Document::Doc> ID.
+
+=back
+
=head1 INHERITANCE