Well, unfortunately, this is a trap that users do hit.

By requiring the user to think about the limit on creating
PostingsHighlighter, he/she would think about it and realize they are
in fact setting a limit.

Silent limits are dangerous because you don't offhand know what's
wrong / why you see nothing getting highlighted.



Mike McCandless

http://blog.mikemccandless.com


On Tue, Oct 15, 2013 at 9:42 AM, Robert Muir <rcm...@gmail.com> wrote:
> I strongly disagree: there is no trap, its a reasonable default for
> good summarization, and the behavior is no different than the other
> highlighters here.
>
> Typically people *do* care about performance and its important to have
> a clean simple API too.
>
> In my opinion increasing this limit is very esoteric: usually
> sentences that deep do not summarize the document well.
>
>
>
> On Tue, Oct 15, 2013 at 9:38 AM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>> Maybe we should make the max length a required argument to
>> PostingsHighlighter ctor?
>>
>> Because it's trappy now, since you don't realize offhand that it's
>> silently enforcing a limit ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Oct 15, 2013 at 9:31 AM, Robert Muir <rcm...@gmail.com> wrote:
>>> Thanks Jon. Ill add some stuff to the javadocs here to try to make it
>>> more obvious.
>>>
>>> On Tue, Oct 15, 2013 at 5:54 AM, Jon Stewart
>>> <j...@lightboxtechnologies.com> wrote:
>>>> Awesome, that did it! I didn't realize that DEFAULT_MAX_LENGTH was
>>>> only 10,000. I've now upped it to 16MB (I'm not doing the usual thing
>>>> and performance is not a particular concern).
>>>>
>>>> Thanks,
>>>>
>>>> Jon
>>>>
>>>>
>>>> On Mon, Oct 14, 2013 at 9:58 PM, Robert Muir <rcm...@gmail.com> wrote:
>>>>> are your documents large?
>>>>>
>>>>> try PostingsHighlighter(int) ctor with a larger value than 
>>>>> DEFAULT_MAX_LENGTH.
>>>>>
>>>>> sounds like the passages you see with matches are very deep into the
>>>>> document and its just hitting the default limit and returning the
>>>>> default summarization (getEmptyHighlight())
>>>>>
>>>>> otherwise, please open a JIRA issue :)
>>>>>
>>>>> On Mon, Oct 14, 2013 at 9:32 PM, Jon Stewart
>>>>> <j...@lightboxtechnologies.com> wrote:
>>>>>> I upgraded to 4.5. Same results, unfortunately. Most docs in the
>>>>>> result set will have a Passage where numMatches() > 0, but some do
>>>>>> not. In these cases, the Passage array's length is greater than zero.
>>>>>>
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 14, 2013 at 5:24 PM, Robert Muir <rcm...@gmail.com> wrote:
>>>>>>> did you try the latest release? There are some bugs fixed...
>>>>>>>
>>>>>>> On Mon, Oct 14, 2013 at 2:11 PM, Jon Stewart
>>>>>>> <j...@lightboxtechnologies.com> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've observed that when using PostingsHighlighter in Lucene 4.4 that
>>>>>>>> some of the responsive documents in TopDocs will have zero matches in
>>>>>>>> the associated array of Passage objects. I.e., in the call of
>>>>>>>> PassageFormatter.format(), there will be some calls where none of the
>>>>>>>> Passage objects in the array will have matches. I've seen this on a
>>>>>>>> simple one-word query, where the word clearly exists in the Document's
>>>>>>>> text for the field (and the Document is included in the TopDocs result
>>>>>>>> set).
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jon
>>>>>>>> --
>>>>>>>> Jon Stewart, Principal
>>>>>>>> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jon Stewart, Principal
>>>>>> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jon Stewart, Principal
>>>> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to