tika-eval misaligned the attachments on that file :(

https://corpora.tika.apache.org/base/reports/tika-2.8.1-pre-rc1-v4.tgz

This also happened to: govdocs1/912/912801.ppt

https://issues.apache.org/jira/browse/TIKA-4103

On Thu, Jul 20, 2023 at 11:39 PM Tilman Hausherr <[email protected]>
wrote:

> I wanted to have a closer look at one of the files that claims to have
> less output:
>
> govdocs1/477/477727.ppt         MBD0104A5C8.doc
>
> so I ran tika-app 2.8.0 and 2.8.1 snapshot and got the same output. But
> according to the excel table I should have these more in B
> (TOP_10_MORE_IN_B column):
>
> and: 3 | land: 3 | micro: 3 | reuse: 3 | sprinkler: 3 | water: 3 |
> capture: 2 | collect: 2 | considered: 2 | leveled: 2
>
> I can't find "sprinkler" in the text below nor in the WORD file.
>
> <?xml version="1.0" encoding="UTF-8"?><html
> xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> <meta name="cp:revision" content="3"/>
> <meta name="meta:word-count" content="322"/>
> <meta name="extended-properties:Application" content="Microsoft Word
> 10.0"/>
> <meta name="meta:last-author" content="Carolyn.Jones"/>
> <meta name="dc:creator" content="NWCC"/>
> <meta name="extended-properties:Company" content="USDA"/>
> <meta name="xmpTPg:NPages" content="1"/>
> <meta name="resourceName" content="MBD0104A5C8.doc"/>
> <meta name="dcterms:created" content="2005-01-06T20:39:00Z"/>
> <meta name="dcterms:modified" content="2005-03-29T22:02:00Z"/>
> <meta name="meta:character-count" content="1836"/>
> <meta name="extended-properties:Template" content="Normal.dot"/>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.DefaultParser"/>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.microsoft.OfficeParser"/>
> <meta name="extended-properties:TotalTime" content="3600000000"/>
> <meta name="Content-Length" content="22528"/>
> <meta name="meta:page-count" content="1"/>
> <meta name="Content-Type" content="application/msword"/>
> <title/>
> </head>
> <body><p><b>Conservation Security Program (CSP)
> </b></p>
> <p><b>Irrigation Enhancement Index Tool</b></p>
> <p>This tool is designed to help landowners conduct a self assessment of
> their eligibility for payment for enhanced irrigation systems in the
> Conservation Security Program.  It may also serve as a means of
> documenting irrigation system components that can be utilized during
> individual interviews.
> </p>
> <p>This procedure is to be utilized on irrigated lands eligible for CSP
> and will result in assigning an Irrigation Enhancement Index value to
> the irrigation system being evaluated.
> </p>
> <p>This procedure starts with a base value that is assigned to the
> specific type of irrigation system in use.  Systems that commonly have
> higher irrigation efficiencies and/or are easier to manage are assigned
> higher values.  Modifiers are applied based on the level of management
> and the efficiency of the on-farm water delivery system.  A bonus is
> given if runoff from the irrigated field is captured for re-use.</p>
> <p>The final calculation will require a value of the Soil Condition
> Index (SCI) multiplier.  The exact value of this multiplier will be
> provided to you when NRCS staff computes your final SCI during your
> interview.  The multiplier will be a value from 0.9 to 1.0 depending on
> your SCI.</p>
> <p>This self assessment is simple and should take less than 5 minutes to
> complete.  A basic hand calculator is recommended. In addition, basic
> knowledge of the irrigation system and management practices in use is
> necessary.  Definitions of the various terms are included in this tool.</p>
> <p>When the self assessment is complete, the landowner will have
> calculated an Irrigation Enhancement Index value for the irrigation
> system.  The Irrigation Enhancement Index is not an efficiency number,
> but rather an indicator of how well the system may perform.  If the
> Irrigation Enhancement Index value is 50 or more, the landowner may be
> eligible for CSP payments.  If the Irrigation Enhancement Index value is
> less than 50, the applicant should consider utilizing other USDA
> programs to improve the irrigation system.  If the Irrigation
> Enhancement Index is 60 or greater, the applicant may be eligible for
> increased payments.
> </p>
> </body></html>
>

Reply via email to