[
https://issues.apache.org/jira/browse/LUCENE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-5638:
--------------------------------
Attachment: LUCENE-5638.patch
Updated patch: I added it to other tokenizers, i also got MockTokenizer using
it by default and randomize between DEFAULT and TOKEN attribute factories in
TestRandomChains.
There were some test failures in TestPayloads, because they asserted
PayloadAttribute was not present. I just removed the assertions.
I will look for other failures and also inspect FreqProxTermsWriter to ensure
it is optimized for the case where payloadAtt != null but returns null every
time, and benchmark indexing somehow with the patch.
> Default Attributes are expensive
> --------------------------------
>
> Key: LUCENE-5638
> URL: https://issues.apache.org/jira/browse/LUCENE-5638
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/analysis
> Reporter: Robert Muir
> Attachments: LUCENE-5638.patch, LUCENE-5638.patch
>
>
> Changes like LUCENE-5634 make it clear that the default AttributeFactory
> stuff has a very high cost: weakmaps/reflection/etc.
> Additionally I think clearAttributes() is more expensive than it should be:
> it has to traverse a linked-list, calling clear() per token.
> Operations like cloning (save/restoreState) have a high cost tll.
> Maybe we can have a better Default? In other words, rename
> DEFAULT_ATTRIBUTE_FACTORY to REFLECTION_ATTRIBUTE_FACTORY, and instead have a
> faster default factory that just has one AttributeImpl with the "core ones"
> that 95% of users are dealing with (TOKEN_ATTRIBUTE_FACTORY?): anything
> outside of that falls back to reflection.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]