I remember the discussion, it seems to be a real improvement, and I will try to 
include it in version 2.4.

Gert


On 12/10/2011, at 16.24, <aj...@virginia.edu> wrote:

> I've offered one straightforward possibility (one that was discussed briefly 
> in Austin) at:
> 
> https://jira.duraspace.org/browse/FCREPO-1010
> 
> Use Apache Tika for extraction:
> Apache Tika is a toolkit that can extract text and metadata from a wide 
> variety of mimetyped formats (including PDF, via PDFBox). Employing Tika as 
> an extraction engine in GSearch would immediately expand enormously the 
> possible range of material over which GSearch could operate, and going 
> forward, GSearch would benefit from new parsers and better-performing parsers 
> created as part of that effort.
> 
> 
> 
> ---
> A. Soroka
> Online Library Environment
> the University of Virginia Library
> 
> 
> 
> 
> On Oct 12, 2011, at 10:07 AM, Gert Schmeltz Pedersen wrote:
> 
>> This message is meant to open for a discussion of the roadmap for GSearch. 
>> It started in a small group, but we invite participation from the wider 
>> group of fedora-developers. I copy this message to the fedora-users list so 
>> that GSearch users are informed about the discussion, but to follow it 
>> onwards and to contribute they have to subscribe to the fedora-developers 
>> list.
>> 
>> I will initiate the discussion with a status. GSearch 2.2 has been the 
>> current release since December 2008. At OR2011 in Austin in June 2011 I 
>> presented a plan for development of GSearch, see 
>> https://conferences.tdl.org/or/OR2011/OR2011main/paper/view/416/127 . 
>> Following that, I have provided GSearch 2.3, and the official release is 
>> near. You can get the source at https://github.com/fcrepo/gsearch and 
>> fedoragsearch.war from the DTU prerelease site at 
>> http://www.cvt.dk/fedoragsearch/ and see the documentation page at 
>> http://miranth.cvt.dk/fedoragsearch/ .
>> 
>> Next step in the plan is to provide GSearch 2.4 by the end of the year. I 
>> will use the issue tracker at 
>> https://jira.duraspace.org/secure/IssueNavigator.jspa?mode=hide&requestId=10311
>>  to track the work, and I invite your feedback and contributions. Potential 
>> committers may be enrolled, I already had some responses to my invitation to 
>> potential committers at OR2011. Some of you may have heard at OR2011, that I 
>> will retire by the end of the year. However, I will continue part-time to 
>> support GSearch users on the fedora-users list and continue to develop for 
>> GSearch and Fedora in partnerships with people, who have an interest in that.
>> 
>> The post-2.4 roadmap discussion can both be on this list and as new or 
>> modified issues at the issue tracker. I think that members of the initial 
>> small group will soon bring up issues.
>> 
>> Gert
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________
>> Fedora-commons-developers mailing list
>> Fedora-commons-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 
> 
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Fedora-commons-developers mailing list
> Fedora-commons-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to