I wanted to have a closer look at one of the files that claims to have
less output:
govdocs1/477/477727.ppt MBD0104A5C8.doc
so I ran tika-app 2.8.0 and 2.8.1 snapshot and got the same output. But
according to the excel table I should have these more in B
(TOP_10_MORE_IN_B column):
and: 3 | land: 3 | micro: 3 | reuse: 3 | sprinkler: 3 | water: 3 |
capture: 2 | collect: 2 | considered: 2 | leveled: 2
I can't find "sprinkler" in the text below nor in the WORD file.
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="cp:revision" content="3"/>
<meta name="meta:word-count" content="322"/>
<meta name="extended-properties:Application" content="Microsoft Word 10.0"/>
<meta name="meta:last-author" content="Carolyn.Jones"/>
<meta name="dc:creator" content="NWCC"/>
<meta name="extended-properties:Company" content="USDA"/>
<meta name="xmpTPg:NPages" content="1"/>
<meta name="resourceName" content="MBD0104A5C8.doc"/>
<meta name="dcterms:created" content="2005-01-06T20:39:00Z"/>
<meta name="dcterms:modified" content="2005-03-29T22:02:00Z"/>
<meta name="meta:character-count" content="1836"/>
<meta name="extended-properties:Template" content="Normal.dot"/>
<meta name="X-TIKA:Parsed-By"
content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-TIKA:Parsed-By"
content="org.apache.tika.parser.microsoft.OfficeParser"/>
<meta name="extended-properties:TotalTime" content="3600000000"/>
<meta name="Content-Length" content="22528"/>
<meta name="meta:page-count" content="1"/>
<meta name="Content-Type" content="application/msword"/>
<title/>
</head>
<body><p><b>Conservation Security Program (CSP)
</b></p>
<p><b>Irrigation Enhancement Index Tool</b></p>
<p>This tool is designed to help landowners conduct a self assessment of
their eligibility for payment for enhanced irrigation systems in the
Conservation Security Program. It may also serve as a means of
documenting irrigation system components that can be utilized during
individual interviews.
</p>
<p>This procedure is to be utilized on irrigated lands eligible for CSP
and will result in assigning an Irrigation Enhancement Index value to
the irrigation system being evaluated.
</p>
<p>This procedure starts with a base value that is assigned to the
specific type of irrigation system in use. Systems that commonly have
higher irrigation efficiencies and/or are easier to manage are assigned
higher values. Modifiers are applied based on the level of management
and the efficiency of the on-farm water delivery system. A bonus is
given if runoff from the irrigated field is captured for re-use.</p>
<p>The final calculation will require a value of the Soil Condition
Index (SCI) multiplier. The exact value of this multiplier will be
provided to you when NRCS staff computes your final SCI during your
interview. The multiplier will be a value from 0.9 to 1.0 depending on
your SCI.</p>
<p>This self assessment is simple and should take less than 5 minutes to
complete. A basic hand calculator is recommended. In addition, basic
knowledge of the irrigation system and management practices in use is
necessary. Definitions of the various terms are included in this tool.</p>
<p>When the self assessment is complete, the landowner will have
calculated an Irrigation Enhancement Index value for the irrigation
system. The Irrigation Enhancement Index is not an efficiency number,
but rather an indicator of how well the system may perform. If the
Irrigation Enhancement Index value is 50 or more, the landowner may be
eligible for CSP payments. If the Irrigation Enhancement Index value is
less than 50, the applicant should consider utilizing other USDA
programs to improve the irrigation system. If the Irrigation
Enhancement Index is 60 or greater, the applicant may be eligible for
increased payments.
</p>
</body></html>