[GitHub] [jena] flange-ipb commented on issue #1985: Lucene index is updated incorrectly during some dataset changes

via GitHub Mon, 07 Aug 2023 04:33:04 -0700


flange-ipb commented on issue #1985:
URL: https://github.com/apache/jena/issues/1985#issuecomment-1667687322


   Hello @afs,
   
   the problem seems to be only with operations that clear the graph. If 
selected triples are deleted, then the index is updated.
   
   A more careful analysis:
   (Note: Between each of the tests I stopped Fuseki and deleted the TDB2 
dataset, the Lucene index and the *run* directories.)
   
   **Graph Store HTTP Protocol**
   * `POST http://127.0.0.1:3030/dataset/data`: Looks good.
     * POST data to an empty dataset
     * POST the same data to an existing dataset
     * POST new triples to an existing dataset
     * POST new and existing triples to an existing dataset
     * POST empty turtle file to an existing dataset
   * `PUT http://127.0.0.1:3030/dataset/data`:
     * PUT data on an empty dataset: Looks good.
     * PUT the same data to an existing dataset: **Not good.** Adds the "new" 
triples to the text index and doesn't remove the "old" ones.
     * PUT new triples to an existing dataset: Looks good. Adds the new triples 
to the text index.
     * PUT new and existing triples to an existing dataset: **Not good.** Adds 
the new and the overwritten triples to the text index.
     * PUT empty turtle file to an existing dataset: **Not good.** Doesn't 
remove the triples from the text index.
   * `DELETE http://127.0.0.1:3030/dataset/data?default`: **Not good.** Doesn't 
remove the triples from the text index.
   
   **SPARQL Update** (`POST http://127.0.0.1:3030/dataset/update`)
   * INSERT: Looks good.
     * INSERT data into an empty dataset (`PREFIX rdfs: 
<http://www.w3.org/2000/01/rdf-schema#>PREFIX dfgfo: 
<https://github.com/tibonto/dfgfo/>INSERT DATA {  dfgfo:101-03 rdfs:label 
"Ancient History"@en .}`)
     * INSERT set same data into an existing dataset (same SPARQL Update query 
as before)
     * INSERT new triples to an existing dataset (`PREFIX rdfs: 
<http://www.w3.org/2000/01/rdf-schema#>PREFIX dfgfo: 
<https://github.com/tibonto/dfgfo/>INSERT DATA {  dfgfo:102-01 rdfs:label 
"Medieval History"@en .}`)
     * INSERT new and existing triples into an existing dataset (`PREFIX rdfs: 
<http://www.w3.org/2000/01/rdf-schema#>PREFIX dfgfo: 
<https://github.com/tibonto/dfgfo/>INSERT DATA {  dfgfo:101-03 rdfs:label 
"Ancient History"@en . dfgfo:102-01 rdfs:label "Medieval History"@en .}`)
   * DELETE:
     * I prepared the dataset by loading *dfgfo.ttl*, then executed `PREFIX 
rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dfgfo: 
<https://github.com/tibonto/dfgfo/>DELETE DATA {  dfgfo:101-03 rdfs:label 
"Ancient History"@en .}`: Looks good. The triple is removed from the text index.
   * LOAD (`LOAD <file:///path/to/file.ttl>`): This repeats what I did with GSP 
POST and PUT. Looks good.
     * LOAD data into an empty dataset
     * LOAD the same data into an existing dataset
     * LOAD new triples into an existing dataset
     * LOAD new and existing triples into an existing dataset
     * LOAD empty turtle file into an existing dataset
   * CLEAR:
     * `CLEAR default`: **Not good.** Same behaviour like GSP DELETE.
   * DROP:
     * `DROP default`: **Not good.** Same behaviour like GSP DELETE.
   * COPY: I added `<#entMap> text:graphField "graph" .` to 
*config-text-tdb2.ttl*, loaded some data into a named graph and then copied it 
into the default graph (`COPY <http://example.org/dfgfo> TO DEFAULT`). This 
emulates what I did with GSP POST and PUT.
     * COPY into empty default graph: Looks good.
     * COPY the same data into non-empty default graph: **Not good.** Adds the 
"new" triples to the text index and doesn't remove the "old" ones (like GSP 
PUT).
     * COPY new triples into non-empty default graph: **Not good.** Adds the 
"new" triples to the text index and doesn't remove the "old" ones.
     * COPY new and existing triples into non-empty default graph: **Not 
good.** Adds the new and the overwritten triples to the text index and doesn't 
remove the "old" ones.
     * COPY data from empty named graph into non-empty default graph: 
Impossible, because Jena doesn't record empty named graphs.
     * COPY data from empty default graph into non-empty named graph (`COPY 
DEFAULT TO <http://example.org/dfgfo>`), then run SPARQL SELECT to check the 
text index on named graph: Looks good. Empty named graph does not exist and 
Lucene index is empty (I checked the Lucene directory.)
   * MOVE: ...
   * ADD: ...
   
   Alright, I'm about to loose my mind with manual testing. If necessary, I can 
contribute some unit tests. I just need some help to get started - let's say a 
TDB2 dataset (can you run GSP operations on it programmatically?) with a Lucene 
index or an embedded Fuseki with the right dataset configuration.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] flange-ipb commented on issue #1985: Lucene index is updated incorrectly during some dataset changes

Reply via email to