Currently, only the TEXT bundle is indexed.

See: 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/search/DSIndexer.java#L1244

Obviously the ideal scenario here is to also index plain text files 
directly, but it doesn't look like it works that way.

HOWEVER, it is worth noting that plain text files should have their text 
"extracted" by the HTMLFilter.  See the dspace.cfg default settings: 
https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L410

(Notice that the HTMLFilter is configured to run for HTML format and 
Text format)

So, currently, Text files should be indexed...but the full text of a 
plain text file is first duplicated to the TEXT bundle before indexing. 
(Not ideal, but it should work)

Does that makes sense?

- Tim

On 9/18/2012 8:50 AM, helix84 wrote:
> On Tue, Sep 18, 2012 at 3:46 PM, Mark H. Wood <mw...@iupui.edu> wrote:
>> I don't understand:  why would there be any need to extract plain text
>> from a bitstream that's already plain text?  Just index it.  The point
>> of text extraction is to create a plain-text bitstream for the indexer
>> to digest.
>
> Mark, does the indexer index text from plain text files in the ORIGINAL 
> bundle?
>
> Regards,
> ~~helix84
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to