aiworkerjohns opened a new issue, #3769:
URL: https://github.com/apache/jena/issues/3769

   ## Status: Done
   
   ## Problem
   
   The upstream `text:query` property function supports plain Lucene query 
strings but has no way to apply structured field-level filters. Applications 
that need faceted navigation (e.g. "search for climate, filtered to 
publisher=CSIRO") must construct complex Lucene query syntax manually or 
post-filter in SPARQL, which is inefficient.
   
   ## Use Case
   
   ```mermaid
   flowchart LR
       User(["User query: 'climate change'"])
       Filter["+ filter: publisher = CSIRO"]
       Query["luc:query"]
       Results["Matching entities ranked by relevance"]
   
       User --> Query
       Filter -.-> Query
       Query --> Results
   ```
   
   - Search box in a data catalogue
   - Keyword search with facet refinement in a document repository
   - Filtered API endpoint for a knowledge graph
   
   ## Technical Work (completed)
   
   - `ShaclTextQueryPF` — new property function registered under 
`urn:jena:lucene:index#query`
   - JSON filter parsing — `{"category": ["Technology"]}` parsed from SPARQL 
string literal argument
   - Filter semantics: OR within a field, AND across fields
   - `TextIndexLucene.queryWithShaclFilters()` — builds `BooleanQuery` 
combining text query with `TermInSetQuery` filters
   - Registered in `TextQuery.init()` alongside upstream `text:query` (which is 
unchanged)
   
   **SPARQL interface:**
   
   ```sparql
   (?s ?score ?literal ?graph ?prop) luc:query (property* queryString filter? 
limit?)
   ```
   
   ## Effort
   
   Completed. `ShaclTextQueryPF` is 340 lines.
   
   ## Decisions Made
   
   - **JSON for filters** over RDF lists or custom DSL — widely understood, 
easy to construct from UI state, reliable detection via `{` prefix
   - **Separate PF** (`luc:query`) rather than extending `text:query` — keeps 
upstream code unmodified, clean namespace separation
   - **`TermInSetQuery`** for filtered facets — avoids 
`BooleanQuery.TooManyClauses` limit with large result sets
   
   ## Pitfalls / Gotchas
   
   - Filter values must match KEYWORD field values exactly (case-sensitive)
   - JSON must be a valid string literal in SPARQL — single quotes around the 
JSON, double quotes inside
   - Filter on a TEXT field won't work as expected (TEXT fields are tokenized, 
filters use exact term matching)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to