GitHub user ehedgehog opened a pull request:

    https://github.com/apache/jena/pull/42

    jena-text updates for constructing documents suitable for conjunctive 
queries

    This change to jena-text allows TextDocumentProducers
    to access the dataset they are monitoring, and to
    create indexes where there is a single (Lucene) document
    for a given subject and all its defined properties
    rather than a separate document per triple that the
    subject appears in.
    
    This allows indexes to be used for conjunctive query
    eg a request such as
    
        city: Plymouth AND street: Station
    
    See https://github.com/epimorphics/ppd-text-index for an
    an example project that uses conjunctive queries and 
    provides a bulk index creation utility.
    
    The changes to jena-text are spread over six files as follows:
    
    # jena-text/src/main/java/org/apache/jena/query/text/DatasetGraphText.java 
    
    The two-phase commit protocol is modified to ensure that a
    DatasetChanges monitor finishes() before the commit protocol
    starts. It is possible for a DatasetChanges to have buffered-up
    changes which have not yet been applied; applying these
    changes mid-commit can cause errors.
    
    [This problem was detected in https://github.com/epimorphics/ppd-text-index
    where TextDocProducerBatch does have buffered state and closing
    the DatasetGraphText threw an exception when the state was flushed.]
    
    A test for this behaviour is not currently available.
    
    # jena-text/src/main/java/org/apache/jena/query/text/TextIndex.java 
    
    Added the abstract method updateEntity, contracted with addEntity().
    Updating an index with updateEntity is intended to discard any
    existing Document with the entity key and create a new one with
    all and only the fields specified by the given Entity, as opposed
    to addEntity which creates a new Document from the Entity even if
    one with that key already exists.
    
    This allows Documents suitable for conjunctive query to
    be created. The project 
    
    # jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java 
    
    Implement updateEntity for a TextIndexLucene.
    
    # jena-text/src/main/java/org/apache/jena/query/text/TextIndexSolr.java 
    
    Placeholder for implementation of TextIndexLucene.updateEntity; at the
    moment we do not support it.
    
    # 
jena-text/src/main/java/org/apache/jena/query/text/assembler/TextDatasetAssembler.java
 
    
    The DatasetAssembler may construct a non-default TextDocProducer to
    feed to the TextDatasetFactory, passing in to the constructor the
    TextIndex that the TextDocProducer uses. This change additionally
    allows for a two-argument constructor taking the DatasetGraph as
    well as the TextIndex as arguments.
    
    The TextDocProducer can use this to query the dataset for triples.
    (EG other triples with the same subject as one it has received, so
    as to build all the properties into a single Document.)
    
    #  
jena-text/src/test/java/org/apache/jena/query/text/assembler/TestTextDatasetAssembler.java
 
    
    Changes to have a test that the two-argument constructor is called
    when appropriate.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/epimorphics/jena-config-doc-producer revised-A

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/42.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #42
    
----
commit 3ff763ac184dd49bdc3b6ceff9acb778aac29eae
Author: Chris Dollin <[email protected]>
Date:   2015-03-11T15:13:43Z

    Compacted and simplified changes to jena-text
    to support ppd-text-index.

commit a8002e8ce3452aad7c51f20600c9389ef71a6dd5
Author: Chris Dollin <[email protected]>
Date:   2015-03-12T15:25:27Z

    Added test to check that the
    assembler gives access to the two-argument constructor of a docProducer.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to