Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/53#issuecomment-99062926
  
    > Multilingual index manages dynamically one index per language. Hence for 
the case where we have two same literals with different languages, they will be 
not stored into the same index. 
    
    Ah, I see. But this still doesn't help for cases where there are small 
differences between literals within the same language, for example 
singular/plural forms that get stemmed by the analyzer, or variations in 
capitalization.
    
    > For the "hash solution", it works fine with a sha1. So we have one more 
field by doc, but I don't think it's embarrassing for the final index size. 
Should I commit it ? 
    
    For me this looks like a sensible solution. But I would love to hear 
comments from others, in particular on the next issue:
    
    > I don't know either if it can disturbs the conjonctive stuff.
    > However, the addEntity interacts with the updateEntity, and entries 
already correspond to triples/quads isn't it ?
    
    With the current default configuration, yes, jena-text entries correspond 
to triples/quads. But with the conjunctive query support, that is no longer the 
case. There is a wider issue here - is jena-text primarily an alternative 
triple/quad index, or is it actually an entity index that just happens to work 
on triples in the default configuration? The latter case makes deletion much 
more difficult, as there is no longer a 1:1 mapping between quads and Lucene 
documents.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to