If you have a budget for this stuff then Stellent provide tools for parsing 
multiple document types and also have a viewer that can display documents with 
their original formatting, plus your highlights. See 
http://www.stellent.com/en/products/outside_in/viewer_tech/index.htm

I don't work for Stellent and haven't used it but I do know this stuff is hard 
to do and they are the only ones I'm aware of trying to provide tools to cover 
all document types which is why I mention it. If anyone has any other similar 
recommendations I would be interested to hear them.


----- Original Message ----
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 8 September, 2006 2:02:47 AM
Subject: Re: Highligher Example

Highlighting a PDF document, last time I looked (quite a while ago), 
involves supplying an xml file that describes offsets for highlighting. 
You can specify the file in the URL. You can also do simple highlighting 
by passing in a list of words to be highlighted, but this does not even 
catch minor differences, like singular to plural.

If someone knows more about using to the lucene highlighter to highlight 
PDF's then please speak up. I think I will have to get into this soon.

- Mark

Mag Gam wrote:
> Thanks for the quick response Erik. I will be getting my LIA book back 
> very
> soon, I forgot it at a destination :-(
>
> Lets assume, there is a document called "hello.pdf" and it has the 
> content
> "this is hello.pdf. It uses Acrobat"
>
> When I perform a search for "Acrobat", i want hello.pdf to show up, 
> and also
> the 'It uses <highlight>Acrobat</highlight>'
>
> something like that.
>
> tia
>
>
>
> On 9/7/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>>
>> There are test cases in the Highlighter codebase that exercise it and
>> show its use, as well as a few examples of it in the "Lucene in
>> Action" codebase.
>>
>> These examples output plain text with some prefix and suffix
>> surrounding the highlighted terms.  Highlighting text in a PDF is
>> possible, I'm pretty sure, but I don't think the same would be easily
>> possible with Microsoft document formats.  I'm not sure if you are
>> asking for these document types to be highlighted or just a plain
>> text representation of them, though.
>>
>>         Erik
>>
>> On Sep 7, 2006, at 6:37 PM, Mag Gam wrote:
>>
>> > Hey
>> >
>> > Anyone have a search result highlighter example?
>> >
>> > I have various doc, PDFs, DOC, TXT, PPT, and I would like to show a
>> > highlight, similar to how google does it...
>> >
>> > tia
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to