[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Osma Suominen (JIRA) Wed, 08 Mar 2017 23:08:54 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902589#comment-15902589
 ]


Osma Suominen commented on JENA-1305:
-------------------------------------

Hi [~anujkumar]!

It looks like I misunderstood ESTestCase then. In any case, I think it would be 
very good to have unit/integration tests that can be run automatically from 
{{mvn test}} without setting up any other infrastructure like ES in advance. Do 
you think this would be possible somehow?

Regarding the "single ES document" solution, I don't disagree with that 
approach, if you can make it work. However, "When there are multiple values for 
the same literal then the current ES implementation overrides the values" 
worries me. I think that any reasonable implementation of jena-text should cope 
with the following scenario (I assume here that jena-text is set up to index 
the rdfs:label property).

1. A multilingual RDF document gets added to the store with these triples:

{noformat}
:de a ex:Country ;
  rdfs:label "Germany"@en, "Deutschland"@de ;
  ex:hasCapital :berlin .

:berlin a ex:City ;
  rdfs:label "Berlin"@en, "Berlin"@de .
{noformat}

Now the jena-text index should find the resource {{:de}} using either keyword 
"Germany" or "Deutschland" and the resource {{:berlin}} with the keyword 
"Berlin".

2. Then further triples with French labels get added to the store:

{noformat}
:de rdfs:label "Allemagne"@fr .

:berlin rdfs:label "Berlin"@fr .
{noformat}

Now the jena-text index should find {{:de}} also with the keyword "Allemagne" 
in addition to "Germany" and "Deutschland".
For {{:berlin}} the situation is unchanged.

3. Now the triples with English labels are removed from the store:

{noformat}
:de rdfs:label "Germany"@en .

:berlin rdfs:label "Berlin"@en .
{noformat}

Now the jena-text index should find {{:de}} using either keyword "Deutschland" 
or "Allemagne", but not "Germany".
{{:berlin}} should still be found using the keyword "Berlin", since it still 
has that label in French and German even if the English one was removed.

Regarding Guava, excellent!

> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Reply via email to