[
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945264#comment-15945264
]
ASF GitHub Bot commented on JENA-1305:
--------------------------------------
Github user osma commented on the issue:
https://github.com/apache/jena/pull/227
I tested the ES backend with some non-toy SKOS data, namely
[YSO](http://finto.fi/en/yso/). I configured the entity definition to index the
predicates `skos:prefLabel`, `skos:altLabel` and `skos:hiddenLabel`. The
dataset has 520k triples and 29k entities. There are in total 150k triples with
these label properties.
I'm using a rather old laptop (i3-2330M with SSD) for the test. Ubuntu
16.04, ES 5.2.1.
Using the ES backend, indexing this dataset took about 25 minutes:
```
16:42:45 INFO [1] PUT http://localhost:3030/ds/data?default
17:08:06 INFO [1] 204 No Content (1 521,465 s)
```
Looking at process stats, most of the time was spent by ES. It spent about
38 minutes CPU time.
I also indexed the same dataset using the Lucene backend. It took less than
30 seconds:
```
17:11:26 INFO [1] PUT http://localhost:3030/ds/data?default
17:11:55 INFO [1] 204 No Content (28,237 s)
```
Query performance seems to be pretty much the same, in fact the ES backend
seems slightly faster than the Lucene backend but there was a lot of variance
so I can't tell for sure.
I have my doubts about whether the indexing performance is acceptable for
real world use cases like what @anujgandharv is targeting, but I don't think
this should stop us from merging this contribution. Since there have been no
objections, I will proceed with the merge.
> Elastic Search Support for Apache Jena Text
> --------------------------------------------
>
> Key: JENA-1305
> URL: https://issues.apache.org/jira/browse/JENA-1305
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.2.0
> Reporter: Anuj Kumar
> Assignee: Osma Suominen
> Labels: elasticsearch
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in
> ElasticSearch. This implementation would be similar to the Lucene and Solr
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)