[
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901917#comment-15901917
]
Anuj Kumar commented on JENA-1305:
----------------------------------
Hi [~osma]
* I have used the ElasticSearch Testing Framework that you mention in your
link. ESTestCase class is for situations where we *DO NOT* need an embedded ES
engine.
[ESIntegTestCase|https://www.elastic.co/guide/en/elasticsearch/reference/current/integration-tests.html]
is for situations where we need an embedded ES instance. That is what I have
used which provides us with a much better testing of the implementation
* *_It appears that you've chosen a model where there is a single ES document
for each subject URI_* - > Yes that is correct. The simple reason being that
this is actually how we plan to use it :).
In case of many different properties for the same entity, all of the
properties will be indexed in the same ES Document. This results in a lot
compact documents.
When there are multiple values for the same literal then the current ES
implementation overrides the values. At-least in our scenario, that is not an
issue. But again if there are use cases where we need to store multiple values
for the same literal and subjectID, then probably the solution would be to
index each triple separately. My biggest concern with such a solution is that
our index will grow very quickly. For example, in our case, our triple count in
currently approaching the billion mark. Even if we assume that only 50% of
those triples will be indexed, we already will have 500 million records in ES.
And this number will simply keep on increasing over a period of time.
So practically, it does not work for us. May be the solution could be to
provide a setting to switch between the two different modes. I am not sure
though on this.
* Regarding the import statements, it was an accidental checkin. The classes
were not compiling on my IDE and therefore I just reverted them to use the
standard guava classes. I will revert back the class.
> Elastic Search Support for Apache Jena Text
> --------------------------------------------
>
> Key: JENA-1305
> URL: https://issues.apache.org/jira/browse/JENA-1305
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.2.0
> Reporter: Anuj Kumar
> Assignee: Osma Suominen
> Labels: elasticsearch
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in
> ElasticSearch. This implementation would be similar to the Lucene and Solr
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)