[ 
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622304#comment-14622304
 ] 

Michael McCandless commented on CONNECTORS-1219:
------------------------------------------------

We could possibly patch Lucene to allow stored=true for Reader as well ... this 
is probably quite tricky, e.g. the codec APIs (StoredFieldsFormat) would need 
to accept Reader too.

Even if we did that, though, a very large document can still be problematic.  
You should test using Reader just for indexing: it could also be even this 
still puts too much heap pressure because IndexWriter must store all tokens for 
that one document in heap before it can write a new segment.

> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, 
> CONNECTORS-1219-v0.2.patch
>
>
> A output connector for Lucene local index directly, not via remote search 
> engine. It would be nice if we could use Lucene various API to the index 
> directly, even though we could do the same thing to the Solr or Elasticsearch 
> index. I assume we can do something to classification, categorization, and 
> tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to