[
https://issues.apache.org/jira/browse/JENA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686781#comment-16686781
]
Code Ferret commented on JENA-1630:
-----------------------------------
[~rvesse], that's correct. Some of our configurations have 4 or 5 language
tagged fields for different encodings, e.g., sa-x-iast, sa-alalc97,
sa-x-aux-ndia, an so forth. Our dataset of ~33M triples shows a savings in the
Lucene index of ~15%
> store literals only once in lucene docs for jena-text w/ multilingual configs
> -----------------------------------------------------------------------------
>
> Key: JENA-1630
> URL: https://issues.apache.org/jira/browse/JENA-1630
> Project: Apache Jena
> Issue Type: Improvement
> Components: Text
> Affects Versions: Jena 3.9.0
> Reporter: Code Ferret
> Assignee: Code Ferret
> Priority: Major
> Labels: easyfix, performance, pull-request-available
> Fix For: Jena 3.10.0
>
>
> We can save some space in the Lucene db for jena-text when using multilingual
> configurations by only storing the incoming literal once rather than for each
> field's language tag variant.
> A PR is ready
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)