[
https://issues.apache.org/jira/browse/JCR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcel Reutegger resolved JCR-2365.
-----------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
1.6.1
This issue does not occur in trunk because we are not using the text-extractors
module anymore. Text extraction is now handled by Apache Tika.
Fixed in 1.6 branch in revision: 830478
> HTML Text Extractor does not extract or index numerics
> ------------------------------------------------------
>
> Key: JCR-2365
> URL: https://issues.apache.org/jira/browse/JCR-2365
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Components: indexing, jackrabbit-text-extractors
> Affects Versions: 1.6.0
> Environment: Win XP-Pro; Win 2003 Enterprise 32bit
> Reporter: Jeremy Anderson
> Fix For: 1.6.1, 2.0.0
>
>
> Numerics such as addresses/dates/financial figures are not extracted or
> indexed by the current HTML Extractor. These values are handled properly and
> searchable when done via the PlainTextExtractor
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.