[
https://issues.apache.org/jira/browse/LUCENE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856520#action_12856520
]
Luke Forehand commented on LUCENE-2208:
---------------------------------------
I just opened a bug for what appears to be the same issue in the SOLR project:
https://issues.apache.org/jira/browse/SOLR-1883
There you will find an attachment of the document that I am attempting to query
with a highlight query but it fails. I also pasted my schema.xml and the
exception stacktrace.
> Token div exceeds length of provided text sized 4114
> ----------------------------------------------------
>
> Key: LUCENE-2208
> URL: https://issues.apache.org/jira/browse/LUCENE-2208
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/highlighter
> Affects Versions: 3.0
> Environment: diagnostics = {os.version=5.1, os=Windows XP,
> lucene.version=3.0.0 883080 - 2009-11-22 15:43:58, source=flush, os.arch=x86,
> java.version=1.6.0_12, java.vendor=Sun Microsystems Inc.}
>
> Reporter: Ramazan VARLIKLI
>
> I have a doc which contains html codes. I want to strip html tags and make
> the test clear after then apply highlighter on the clear text . But
> highlighter throws an exceptions if I strip out the html characters , if i
> don't strip out , it works fine. It just confuses me at the moment
> I copy paste 3 thing here from the console as it may contain special
> characters which might cause the problem.
> 1 -) Here is the html text
> <h2>Starter</h2>
> <div id="tab1-content" class="tabContent selected">
> <div class="head"></div>
> <div class="body">
> <div class="subject-header">Learning path: History</div>
> <h3>Key question</h3>
> <p>Did transport fuel the industrial revolution?</p>
> <h3>Learning Objective</h3>
> <ul>
> <li>To categorise points as for or against an argument</li>
> </ul>
> <p>
> <h3>What to do?</h3>
> <ul>
> <li>Watch the clip: <em>Transport fuelled the industrial
> revolution.</em></li>
> </ul>
> <p>The clips claims that transport fuelled the industrial
> revolution. Some historians argue that the industrial revolution only
> happened because of developments in transport.</p>
> <ul>
> <li>Read the statements below and decide which
> points are <em>for</em> and which points are <em>against</em> the argument
> that industry expanded in the 18th and 19th centuries because of developments
> in transport.</li>
> </ul>
>
> <ol type="a">
> <li>Industry expanded because of inventions and
> the discovery of steam power.</li>
> <li>Improvements in transport allowed goods to
> be sold all over the country and all over the world so there were more
> customers to develop industry for.</li>
> <li>Developments in transport allowed
> resources, such as coal from mines and cotton from America to come together
> to manufacture products.</li>
> <li>Transport only developed because industry
> needed it. It was slow to develop as money was spent on improving roads, then
> building canals and the replacing them with railways in order to keep up with
> industry.</li>
> </ol>
>
> <p>Now try to think of 2 more statements of your
> own.</p>
>
> </div>
> <div class="foot"></div>
> </div>
> <h2>Main activity</h2>
> <div id="tab2-content" class="tabContent">
> <div class="head"></div>
> <div class="body"><div class="subject-header">Learning path:
> History</div>
> <h3>Learning Objective</h3>
> <ul>
> <li>To select evidence to support points</li>
> </ul>
> <h3>What to do?</h3>
> <!--<ul>
> <li>Watch the clip: <em>Windmill and water mill</em></li>
> </ul>-->
> <ul><li>Choose the 4 points that you think are most important -
> try to be balanced by having two <strong>for</strong> and two
> <strong>against</strong>.</li>
> <li>Write one in each of the point boxes of the
> paragraphs on the sheet <a href="lp_history_industry_transport_ws1.html"
> class="link-internal">Constructing a balanced argument</a>.</li></ul> <p>You
> might like to re write the points in your own words and use connectives to
> link the paragraphs.</p>
>
> <p>In history and in any argument, you need evidence
> to support your points.</p>
> <ul><li>Find evidence from these sources and from
> your own knowledge to support each of your points:</li></ul>
> <ol>
> <li><a
> href="../servlet/link?template=vid¯o=setResource&resourceID=2044"
> class="link-internal">At a toll gate</a></li>
> <li><a
> href="../servlet/link?macro=setResource&template=vid&resourceID=2046"
> class="link-internal">Canals</a></li>
> <li><a
> href="../servlet/link?macro=setResource&template=vid&resourceID=2043"
> class="link-internal">Growing cities: traffic</a></li>
> <li><a
> href="../servlet/link?macro=setResource&template=vid&resourceID=2047"
> class="link-internal">Impact of the railway</a> </li>
> <li><a
> href="../servlet/link?macro=setResource&template=vid&resourceID=2048"
> class="link-internal">Sailing ships</a> </li>
> <li><a
> href="../servlet/link?macro=setResource&template=vid&resourceID=2050"
> class="link-internal">Liverpool: Capital of Culture</a> </li>
> </ol>
> <p>Try to be specific in your evidence - use named
> examples of places or people. Use dates if you can.</p>
> </div>
> <div class="foot"></div>
> </div>
> <h2>Plenary</h2>
> <div id="tab3-content" class="tabContent">
> <div class="head"></div>
> <div class="body"><div class="subject-header">Learning path:
> History</div>
> <h3>Learning Objective</h3>
> <ul>
> <li>To judge which of the arguments is most valid</li>
> </ul>
> <h3>What to do?</h3>
> <!-- <ul>
> <li>Watch the clip: <em>Food of the rich</em></li>
> </ul>-->
> <p>In order to be a good historian, and get good marks in
> exams, you need to show your evaluation skills and make a judgement. Having
> been through the evidence which point do you think is most important? Why? Is
> there more evidence? Is the evidence more convincing?</p>
> <ul><li>In the final box on your worksheet write a
> conclusion explaining whether on balance the evidence is enough to convince
> you that transport fuelled the industrial revolution.</li></ul>
> </div>
> <div class="foot"></div>
> </div>
> <h2>Extension</h2>
> <div id="tab4-content" class="tabContent">
> <div class="head"></div>
> <div class="body"><div class="subject-header">Learning path:
> History</div>
> <h3>What to do?</h3>
> <p>Watch the clip <em>Stress in a ski resort</em></p>
> <p>New industries, such as tourism, can now be said
> to be fuelled by transport improvements.</p>
> <ul><li>Search Clipbank, using the Related clip lists as well
> as the search function, to find examples from around the world of how
> transport has helped industry.</li></ul>
> </div>
> <div class="foot"></div>
> </div>
>
>
> 2-) here is the text after stripped html tags out
> Starter
>
>
>
> Learning path: History
> Key question
> Did transport fuel the industrial revolution?
> Learning Objective
>
> To categorise points as for or against an argument
>
>
> What to do?
>
> Watch the clip: Transport fuelled the industrial
> revolution.
>
> The clips claims that transport fuelled the industrial
> revolution. Some historians argue that the industrial revolution only
> happened because of developments in transport.
>
> Read the statements below and decide which
> points are for and which points are against the argument that industry
> expanded in the 18th and 19th centuries because of developments in transport.
>
>
>
> Industry expanded because of inventions and
> the discovery of steam power.
> Improvements in transport allowed goods to be
> sold all over the country and all over the world so there were more customers
> to develop industry for.
> Developments in transport allowed resources,
> such as coal from mines and cotton from America to come together to
> manufacture products.
> Transport only developed because industry
> needed it. It was slow to develop as money was spent on improving roads, then
> building canals and the replacing them with railways in order to keep up with
> industry.
>
>
> Now try to think of 2 more statements of your own.
>
>
>
>
> Main activity
>
>
> Learning path: History
> Learning Objective
>
> To select evidence to support points
>
> What to do?
>
> Choose the 4 points that you think are most important - try
> to be balanced by having two for and two against .
> Write one in each of the point boxes of the
> paragraphs on the sheet Constructing a balanced argument . You might like
> to re write the points in your own words and use connectives to link the
> paragraphs.
>
> In history and in any argument, you need evidence to
> support your points.
> Find evidence from these sources and from your own
> knowledge to support each of your points:
>
> At a toll gate
> Canals
> Growing cities: traffic
> Impact of the railway
> Sailing ships
> Liverpool: Capital of Culture
>
> Try to be specific in your evidence - use named
> examples of places or people. Use dates if you can.
>
>
>
> Plenary
>
>
> Learning path: History
> Learning Objective
>
> To judge which of the arguments is most valid
>
> What to do?
>
> In order to be a good historian, and get good marks in exams,
> you need to show your evaluation skills and make a judgement. Having been
> through the evidence which point do you think is most important? Why? Is
> there more evidence? Is the evidence more convincing?
> In the final box on your worksheet write a
> conclusion explaining whether on balance the evidence is enough to convince
> you that transport fuelled the industrial revolution.
>
>
>
> Extension
>
>
> Learning path: History
> What to do?
> Watch the clip Stress in a ski resort
> New industries, such as tourism, can now be said to
> be fuelled by transport improvements.
> Search Clipbank, using the Related clip lists as well as the
> search function, to find examples from around the world of how transport has
> helped industry.
>
>
>
>
> 3-) here is the exception I get
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token div
> exceeds length of provided text sized 4114
> at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228)
> at
> org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:158)
> at
> org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:462)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]