Yes, sounds great, Harry.

The function getAttachmentContent(Attachment) is called whenever setupTask is executed.

It would be another functionality to feed Lucene just after attachment gets ready, a good idea.

What I meant is to make the text conversion dependent on the MIME type of the attachment instead of the filename extensions, however this is not really important in the first place.

I would like to go after this immediately, however, due to overload in other areas, this will take a while. I will come back asap because accumulated knowledge is not only in wiki pages but in attachments as well.

Rolf

On 14.01.2011 20:30, Harry Metske wrote:
making a filter that processes "non plain text"  files like the ones you
mentioned sounds good.
If I understand it correctly it should be called when adding an attachment,
it should process the file creating searchable text and hand them off to
lucene for indexing right ?
please also consider a unit test for it.

adding a few more file-types for pure text files is a good quick-win,
starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
.gpx .loc

anyone else opinions, suggestions ?

regards,
Harry

2011/1/13 Rolf Schumacher<[email protected]>

ok, Harry, thank you for the link.

My suggestions, please correct:

- hard-coding of file types seems to me as not a problem: anything shall be
searched
- the list is too short, important types such as .doc, .odt, .pdf, .ppt,
.odp are missing
- am I right here?: If I can provide a filter that makes text out of this
files it should not be as tough to add them
- we may be better off if we have an attribute with each attachment telling
its MIME type as far as detectable at attachment time, that way we are not
as much dependent on correct file extentions

- a quick suggestion: please add .mm as another xml type. The freemind
plugin is of great value.

kind regards


Rolf



On 11.01.2011 18:42, Harry Metske wrote:

Rolf,

see the source

https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328


as you can see, currently the filetypes are hardcoded to just 4 types.
We could make this a configurable option, patches are welcome.

You say "comments given to an Attachment", I assume you mean Change Notes
entered while uploading an attachment (or saving an normal Wiki Page).
That is a bit more work I think.
Being a complete Lucene null, but looking at the code it looks like we
could
add another field (we already index the page author and page name) for the
Change Note.

regards,
Harry


2011/1/10 Rolf Schumacher<[email protected]>



I am using JSPWiki 2.8.4

Is it possible to extend a search to attachments to some mime types, e.g.
pdf?

Is it possible to extend a search to the comments given to an attachment?

kind regards

Rolf




Reply via email to