[
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622304#comment-14622304
]
Michael McCandless commented on CONNECTORS-1219:
------------------------------------------------
We could possibly patch Lucene to allow stored=true for Reader as well ... this
is probably quite tricky, e.g. the codec APIs (StoredFieldsFormat) would need
to accept Reader too.
Even if we did that, though, a very large document can still be problematic.
You should test using Reader just for indexing: it could also be even this
still puts too much heap pressure because IndexWriter must store all tokens for
that one document in heap before it can write a new segment.
> Lucene Output Connector
> -----------------------
>
> Key: CONNECTORS-1219
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Attachments: CONNECTORS-1219-v0.1patch.patch,
> CONNECTORS-1219-v0.2.patch
>
>
> A output connector for Lucene local index directly, not via remote search
> engine. It would be nice if we could use Lucene various API to the index
> directly, even though we could do the same thing to the Solr or Elasticsearch
> index. I assume we can do something to classification, categorization, and
> tagging, using e.g lucene-classification package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)