[ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945264#comment-15945264
 ] 

ASF GitHub Bot commented on JENA-1305:
--------------------------------------

Github user osma commented on the issue:

    https://github.com/apache/jena/pull/227
  
    I tested the ES backend with some non-toy SKOS data, namely 
[YSO](http://finto.fi/en/yso/). I configured the entity definition to index the 
predicates `skos:prefLabel`, `skos:altLabel` and `skos:hiddenLabel`. The 
dataset has 520k triples and 29k entities. There are in total 150k triples with 
these label properties.
    
    I'm using a rather old laptop (i3-2330M with SSD) for the test. Ubuntu 
16.04, ES 5.2.1.
    
    Using the ES backend, indexing this dataset took about 25 minutes:
    ```
    16:42:45 INFO  [1] PUT http://localhost:3030/ds/data?default
    17:08:06 INFO  [1] 204 No Content (1 521,465 s) 
    ```
    
    Looking at process stats, most of the time was spent by ES. It spent about 
38 minutes CPU time.
    
    I also indexed the same dataset using the Lucene backend. It took less than 
30 seconds:
    ```
    17:11:26 INFO  [1] PUT http://localhost:3030/ds/data?default
    17:11:55 INFO  [1] 204 No Content (28,237 s) 
    ```
    
    Query performance seems to be pretty much the same, in fact the ES backend 
seems slightly faster than the Lucene backend but there was a lot of variance 
so I can't tell for sure.
    
    I have my doubts about whether the indexing performance is acceptable for 
real world use cases like what @anujgandharv is targeting, but I don't think 
this should stop us from merging this contribution. Since there have been no 
objections, I will proceed with the merge.


> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to