tika-eval misaligned the attachments on that file :( https://corpora.tika.apache.org/base/reports/tika-2.8.1-pre-rc1-v4.tgz
This also happened to: govdocs1/912/912801.ppt https://issues.apache.org/jira/browse/TIKA-4103 On Thu, Jul 20, 2023 at 11:39 PM Tilman Hausherr <[email protected]> wrote: > I wanted to have a closer look at one of the files that claims to have > less output: > > govdocs1/477/477727.ppt MBD0104A5C8.doc > > so I ran tika-app 2.8.0 and 2.8.1 snapshot and got the same output. But > according to the excel table I should have these more in B > (TOP_10_MORE_IN_B column): > > and: 3 | land: 3 | micro: 3 | reuse: 3 | sprinkler: 3 | water: 3 | > capture: 2 | collect: 2 | considered: 2 | leveled: 2 > > I can't find "sprinkler" in the text below nor in the WORD file. > > <?xml version="1.0" encoding="UTF-8"?><html > xmlns="http://www.w3.org/1999/xhtml"> > <head> > <meta name="cp:revision" content="3"/> > <meta name="meta:word-count" content="322"/> > <meta name="extended-properties:Application" content="Microsoft Word > 10.0"/> > <meta name="meta:last-author" content="Carolyn.Jones"/> > <meta name="dc:creator" content="NWCC"/> > <meta name="extended-properties:Company" content="USDA"/> > <meta name="xmpTPg:NPages" content="1"/> > <meta name="resourceName" content="MBD0104A5C8.doc"/> > <meta name="dcterms:created" content="2005-01-06T20:39:00Z"/> > <meta name="dcterms:modified" content="2005-03-29T22:02:00Z"/> > <meta name="meta:character-count" content="1836"/> > <meta name="extended-properties:Template" content="Normal.dot"/> > <meta name="X-TIKA:Parsed-By" > content="org.apache.tika.parser.DefaultParser"/> > <meta name="X-TIKA:Parsed-By" > content="org.apache.tika.parser.microsoft.OfficeParser"/> > <meta name="extended-properties:TotalTime" content="3600000000"/> > <meta name="Content-Length" content="22528"/> > <meta name="meta:page-count" content="1"/> > <meta name="Content-Type" content="application/msword"/> > <title/> > </head> > <body><p><b>Conservation Security Program (CSP) > </b></p> > <p><b>Irrigation Enhancement Index Tool</b></p> > <p>This tool is designed to help landowners conduct a self assessment of > their eligibility for payment for enhanced irrigation systems in the > Conservation Security Program. It may also serve as a means of > documenting irrigation system components that can be utilized during > individual interviews. > </p> > <p>This procedure is to be utilized on irrigated lands eligible for CSP > and will result in assigning an Irrigation Enhancement Index value to > the irrigation system being evaluated. > </p> > <p>This procedure starts with a base value that is assigned to the > specific type of irrigation system in use. Systems that commonly have > higher irrigation efficiencies and/or are easier to manage are assigned > higher values. Modifiers are applied based on the level of management > and the efficiency of the on-farm water delivery system. A bonus is > given if runoff from the irrigated field is captured for re-use.</p> > <p>The final calculation will require a value of the Soil Condition > Index (SCI) multiplier. The exact value of this multiplier will be > provided to you when NRCS staff computes your final SCI during your > interview. The multiplier will be a value from 0.9 to 1.0 depending on > your SCI.</p> > <p>This self assessment is simple and should take less than 5 minutes to > complete. A basic hand calculator is recommended. In addition, basic > knowledge of the irrigation system and management practices in use is > necessary. Definitions of the various terms are included in this tool.</p> > <p>When the self assessment is complete, the landowner will have > calculated an Irrigation Enhancement Index value for the irrigation > system. The Irrigation Enhancement Index is not an efficiency number, > but rather an indicator of how well the system may perform. If the > Irrigation Enhancement Index value is 50 or more, the landowner may be > eligible for CSP payments. If the Irrigation Enhancement Index value is > less than 50, the applicant should consider utilizing other USDA > programs to improve the irrigation system. If the Irrigation > Enhancement Index is 60 or greater, the applicant may be eligible for > increased payments. > </p> > </body></html> >
