[ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895788#comment-15895788
 ] 

Osma Suominen edited comment on JENA-1305 at 3/4/17 5:45 PM:
-------------------------------------------------------------

This sounds great! From my perspective, even a smaller feature set would be 
acceptable, as long as basic text indexing functionality works.

One important thing is to have unit tests from the start. Luckily ES seems to 
provide good support for that in the form of a [testing 
framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html].
 I hope you can make use of that (or something similar).

I hope you can make use of the existing jena-text Lucene code (and possibly the 
Solr code as well if it helps). In fact, I strongly suggest that you avoid 
duplicating code if at all possible, and instead try to implement the ES side 
so that it shares as much code as possible with the Lucene support. This may 
require some refactoring of existing code; I'm willing to help with that.

Also I hope that you can make use of the existing Lucene unit tests. In my 
mind, the unit tests that test a specific feature (say, deleting indexed 
values) should be the same regardless of which backend (Lucene/ES) is being 
used. This may require some reengineering of the test classes so that their 
functionality and naming can become backend-independent. The inheritance 
hierarchy is already quite convoluted though, and I'm partially responsible for 
that. I can help with the tests as well.

You can base your implementation on this branch:
https://github.com/osma/jena/tree/jena-1301-drop-solr
i.e. my branch which contains the Lucene 6 upgrade (JENA-1250/PR #219) as well 
as dropping of Solr support (JENA-1301/PR #220). I expect to merge these to 
Jena master soon, I just want to give people a chance to comment and perhaps do 
some additional testing as well before merging.

Just a reminder: When the code is done, the [jena-text 
documentation|https://jena.apache.org/documentation/query/text-query.html] 
needs to be updated as well. Also there should be example configuration files 
for jena-text with ES alongside the jena-text/Lucene examples.


was (Author: osma):
This sounds great! From my perspective, even a smaller feature set would be 
acceptable, as long as basic text indexing functionality works.

One important thing is to have unit tests from the start. Luckily ES seems to 
provide good support for that in the form of a [testing 
framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html].
 I hope you can make use of that (or something similar).

I hope you can make use of the existing jena-text Lucene code (and possibly the 
Solr code as well if it helps). In fact, I strongly suggest that you avoid 
duplicating code if at all possible, and instead try to implement the ES side 
so that it shares as much code as possible with the Lucene support. This may 
require some refactoring of existing code; I'm willing to help with that.

Also I hope that you can make use of the existing Lucene unit tests. In my 
mind, the unit tests that test a specific feature (say, deleting indexed 
values) should be the same regardless of which backend (Lucene/ES) is being 
used. This may require some reengineering of the test classes so that their 
functionality and naming can become backend-independent. The inheritance 
hierarchy is already quite convoluted though, and I'm partially responsible for 
that. I can help with the tests as well.

You can base your implementation on this branch:
https://github.com/osma/jena/tree/jena-1301-drop-solr
i.e. my branch which contains the Lucene 6 upgrade (JENA-1250/PR #219) as well 
as dropping of Solr support (JENA-1301/PR #220). I expect to merge these to 
Jena master soon, I just want to give people a chance to comment and perhaps do 
some additional testing as well before merging.

Just a reminder: When the code is done, the [jena-text 
documentation](https://jena.apache.org/documentation/query/text-query.html) 
needs to be updated as well. Also there should be example configuration files 
for jena-text with ES alongside the jena-text/Lucene examples.

> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to