[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

Steve Rowe (JIRA) Fri, 06 Apr 2018 06:50:14 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428348#comment-16428348
 ]


Steve Rowe commented on LUCENE-2899:
------------------------------------

Note the workaround on SOLR-4793 for ZK resources larger than 1M; from the ZK 
admin manual:

{quote}
This option can only be set as a Java system property. There is no zookeeper 
prefix on it. It specifies the maximum size of the data that can be stored in a 
znode. The default is 0xfffff, or just under 1M. If this option is changed, the 
system property must be set on all servers and clients otherwise problems will 
arise. This is really a sanity check. ZooKeeper is designed to store data on 
the order of kilobytes in size.
{quote}

 This is spelled out a little more here: 
https://www.shi-gmbh.com/tutorials/increase-file-size-zookeeper/

> Add OpenNLP Analysis capabilities as a module
> ---------------------------------------------
>
>                 Key: LUCENE-2899
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2899
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Grant Ingersoll
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 7.3, master (8.0)
>
>         Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

Reply via email to