l1x opened a new issue #487:
URL: https://github.com/apache/lucenenet/issues/487
I have a very simple setup. I would like to index database rows. Each row
has a unique unsigned integer 32 bit id and some text fields.
```md
| id | brand | name |
-------------------
| 1 | Nikon | Nikon AF-S DX Nikkor 35 mm |
| 2 | Cannon | Canon EF-S 55-250 mm |
```
When I try to index these rows Lucene produces duplicates. Adding the whole
dataset again and again.
I was reading somewhere that to avoid this behaviour the
IndexWriter.UpdateDocument() function has to be used. I am not sure how to use
the Term properly.
```Fsharp
// id is the id from the database (uint32)
let id = doc.GetField("id").GetStringValue()
let term = Term("id", id)
writer.UpdateDocument(term, doc)
```
Document code:
```Fsharp
let getDocument (inputDocument:Lens) =
let id = StoredField("id", inputDocument.Id)
let name = TextField("name", inputDocument.Name, Field.Store.YES)
let doc = Document()
doc.Add(id)
doc.Add(name)
// return
doc
```
IndexWriter.update:
```Fsharp
let addDocumentToIndex (writer:IndexWriter) (doc:Document) =
try
let id = doc.GetField("id").GetStringValue()
let term = Term("id", id)
writer.UpdateDocument(term, doc)
writer.Flush(triggerMerge = false, applyAllDeletes = false)
Ok "Ok"
with ex ->
logger <| sprintf "Exception : %s" ex.Message
logger <| sprintf "Exception : %A" ex.StackTrace
Error ex.Message
```
Not sure what exactly is the problem but it is reliably producing
duplicates. I have created a complete workflow to reproduce this here:
https://gist.github.com/l1x/91c36b867acc70e8486a6bce7899332a
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]