woj-tek commented on PR #2342: URL: https://github.com/apache/james-project/pull/2342#issuecomment-2283478008
Hi Uwe and thank you so much for the very insightful comments. Now it makes much more sense :) > My recommendation would be to remove the current version of Lucene support completely and rewrite it, ideally maybe with a configurable Index schema or better - usage of Apache Solr, Elasticsearch or Opensearch (to have the indexing clearly separated and scaleable for huge installations). E.g., Dovecot IMAP server has support for Apache Solr or Elasticsearch, so indexing e-Mail is straight forwards and by adapting the schema files shipped with the repo, it is possible to customize the text analysis without changing the code. James already supports OpenSearch and I think it's preferred way. Though Lucene implementation is handy for small deployments where having single (or better yet, limited number of services) is convenient. As I said before - I had zero knowledge about Lucene just a couple of days ago and I was just trying to make it work by looking at various documentations / SO and so forth. > The correct way to update document in Lucene is the following: Rebuild the document using the same code which was used during indexing - don't use any information from index. E.g., read your document from the database/mail folder/EML file/.... and create a completely new document with applying the indexing schema that was configured by the user (languages). Important is also to index IDs as StringField, because any other field type is not supported for IDs. When you do this, the document is reachable easily using its IDs (case sensitive) and can be updated. @ james team - should we do this or go further with `DocValues` mentioned? @uschindler - I would be helpful if above information could somehow be included in Lucene javadocs so it would be more clear what to use and what to avoid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
