[
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902589#comment-15902589
]
Osma Suominen commented on JENA-1305:
-------------------------------------
Hi [~anujkumar]!
It looks like I misunderstood ESTestCase then. In any case, I think it would be
very good to have unit/integration tests that can be run automatically from
{{mvn test}} without setting up any other infrastructure like ES in advance. Do
you think this would be possible somehow?
Regarding the "single ES document" solution, I don't disagree with that
approach, if you can make it work. However, "When there are multiple values for
the same literal then the current ES implementation overrides the values"
worries me. I think that any reasonable implementation of jena-text should cope
with the following scenario (I assume here that jena-text is set up to index
the rdfs:label property).
1. A multilingual RDF document gets added to the store with these triples:
{noformat}
:de a ex:Country ;
rdfs:label "Germany"@en, "Deutschland"@de ;
ex:hasCapital :berlin .
:berlin a ex:City ;
rdfs:label "Berlin"@en, "Berlin"@de .
{noformat}
Now the jena-text index should find the resource {{:de}} using either keyword
"Germany" or "Deutschland" and the resource {{:berlin}} with the keyword
"Berlin".
2. Then further triples with French labels get added to the store:
{noformat}
:de rdfs:label "Allemagne"@fr .
:berlin rdfs:label "Berlin"@fr .
{noformat}
Now the jena-text index should find {{:de}} also with the keyword "Allemagne"
in addition to "Germany" and "Deutschland".
For {{:berlin}} the situation is unchanged.
3. Now the triples with English labels are removed from the store:
{noformat}
:de rdfs:label "Germany"@en .
:berlin rdfs:label "Berlin"@en .
{noformat}
Now the jena-text index should find {{:de}} using either keyword "Deutschland"
or "Allemagne", but not "Germany".
{{:berlin}} should still be found using the keyword "Berlin", since it still
has that label in French and German even if the English one was removed.
Regarding Guava, excellent!
> Elastic Search Support for Apache Jena Text
> --------------------------------------------
>
> Key: JENA-1305
> URL: https://issues.apache.org/jira/browse/JENA-1305
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.2.0
> Reporter: Anuj Kumar
> Assignee: Osma Suominen
> Labels: elasticsearch
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in
> ElasticSearch. This implementation would be similar to the Lucene and Solr
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)