Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/52#issuecomment-97329735
  
    > 1. I'm not familiar with assembler configuration. But if you want to give 
some help ;-)
    
    I'll try. I've done two jena-text patches in the past, and in both cases I 
added support for assembler configuration.
    
    For your patch I think it would be useful to be able to enable/disable 
multilingual indexing in a particular jena-text index (default should be to 
disable it, for backwards compatibility). Adjusting the particular 
language-specific indexers, as I originally suggested, is not very important at 
this point.
    
    Thinking about assembler configuration, I think it would be easiest to plug 
this in as an alternative to the current Analyzer variants (StandardAnalyzer, 
SimpleAnalyzer, KeywordAnalyzer, LowerCaseKeywordAnalyzer). You can look at my 
patch in [JENA-776](https://issues.apache.org/jira/browse/JENA-776) that added 
the LowerCaseKeywordAnalyzer variant. Basically you need to create a new class 
such as MultilingualAnalyzerAssembler (similar to the other *AnalyzerAssembler 
classes) and plug support for it into TextAssembler. It's shouldn't be very 
difficult...
    
    > 2. Ok, I will refactor it to leave previous signatures and calls.
    > 3. Sure, it's more clean to extend Entity... ok, todo list. 
    
    Excellent!
    
    > For the tests and doc, I 'm pretty busy at the moment.
    
    I can't speak for Jena officially as I'm just an occasional contributor 
with an interest in jena-text, but Jena has very good unit test coverage and I 
think unit tests are expected from new contributions as well. If you won't 
write unit tests for this, I bet nobody else will... Again it's not very hard, 
you can look at my LowerCaseKeywordAnalyzer patch for an example.
    
    Regarding documentation, I think that what's needed is to update the main 
jena-text document, particularly the [Configuring an 
Analyzer](https://jena.apache.org/documentation/query/text-query.html#configuring-an-analyzer)
 section. I'm not 100% sure how it is technically maintained these days, but it 
used to be maintained via the CMS that [you can 
use](http://www.apache.org/dev/cmsref#non-committer) to provide a documentation 
patch. But I think it should be fine also to just provide an update as a 
comment here on GitHub. Again see JENA-776 for an example, there I just wrote 
up the small change to the documentation as a comment and @afs picked it up 
from there.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to