[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Anuj Kumar (JIRA) Wed, 08 Mar 2017 12:21:02 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901917#comment-15901917
 ]


Anuj Kumar commented on JENA-1305:
----------------------------------

Hi [~osma]

* I have used the ElasticSearch Testing Framework that you mention in your 
link. ESTestCase class is for situations where we *DO NOT* need an embedded ES 
engine. 
[ESIntegTestCase|https://www.elastic.co/guide/en/elasticsearch/reference/current/integration-tests.html]
 is for situations where we need an embedded ES instance. That is what I have 
used which provides us with a much better testing of the implementation
* *_It appears that you've chosen a model where there is a single ES document 
for each subject URI_* - > Yes that is correct. The simple reason being that 
this is actually how we plan to use it :).
 In case of many different properties for the same entity, all of the 
properties will be indexed in the same ES Document. This results in a lot 
compact documents. 
When there are multiple values for the same literal then the current ES 
implementation overrides the values. At-least in our scenario, that is not an 
issue. But again if there are use cases where we need to store multiple values 
for the same literal and subjectID, then probably the solution would be to 
index each triple separately. My biggest concern with such a solution is that 
our index will grow very quickly. For example, in our case, our triple count in 
currently approaching the billion mark. Even if we assume that only 50% of 
those triples will be indexed, we already will have 500 million records in ES. 
And this number will simply keep on increasing over a period of time.
So practically, it does not work for us. May be the solution could be to 
provide a setting to switch between the two different modes. I am not sure 
though on this.
* Regarding the import statements, it was an accidental checkin. The classes 
were not compiling on my IDE and therefore I just reverted them to use the 
standard guava classes. I will revert back the class.

> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Reply via email to