Mark - Thank you.  It's in our maven repository.  Graham had mentioned there 
would be some work to get this going, but I didn't know what it involved.

Everything built and installed with some minor code changes, which was very 
nifty.  I still got an error in the word filter.

Hardy Pottinger had sent me a link to this notice:
http://code.google.com/p/text-mining/issues/detail?id=5

I didn't know what "rejar" meant, but I found this to work:
1.  Get source for this version of text-mining utils with 'svn checkout 
http://text-mining.googlecode.com/svn/trunk/ text-mining-read-only' command
2.  From this tree, delete lib/poi-3.0.1-FINAL-20070705.jar and replace with 
poi-3.6.jar
3.  Rebuild with 'ant' command
4.  Copy build/bin/tm-extractors-1.0.jar to lib/dspace-tm-extractors-1.0.0.jar 
directory of my dspace deployment directory

Then filter-media works fine with the new PowerPoint filter and the WordFilter.

So, could we rebuild the dspace-tm-extractors-1.00.jar against poi-3.6 and put 
that in our maven repository? I suppose now would also be a good opportunity 
for me to learn about the unit testing framework and use it to make sure 
filtering still works as well as it did before the change!

 Ryan Ackley, the developer for these tm-extractors also worked on the POI 
project for a while.   Presumably he's very busy, but I'll contact him and ask 
if POI now has the full capability of the tm-extractors and hope for an answer 
- because maybe we don't even need the tm-extractors library if the POI 
extractors were rewritten by Ryan.

It looks like the current WordFilter doesn't handle the new Microsoft Word XML 
formats - so that may be another small project for someone to take on soon.  


--keith


On Oct 7, 2010, at 3:35 AM, Mark Diggory wrote:

> As its not in the maven central repository.  We would need to release
> it ourselves under org.dspace.dependencies or see if someone else can
> push out a new version of tm-extractors for maven central.
> 
> To release into our repository, we just need to author a pom.xml file
> for the tm-extractors and package the jar... I set this up, but had
> some issues with sonatype failing to let me see the staged release on
> their side. I did release to the central repository.  Still waiting to
> see it show up here:
> 
> http://repo2.maven.org/maven2/org/dspace/dependencies/dspace-tm-extractors
> 
> once available, give it a try and see if it fixes your issues.
> 
> Mark
> 
> On Wed, Oct 6, 2010 at 11:11 AM, Keith Gilbertson
> <keith.gilbert...@library.gatech.edu> wrote:
>> Thanks Graham and Tim.  I hadn't seen that.
>> 
>> On Oct 6, 2010, at 11:52 AM, Graham Triggs wrote:
>> 
>> That version of tm-extractors is quite old.
>> There is a newer version on the Google site
>> - http://code.google.com/p/text-mining/ - but it will take a bit of work
>> wrapping things up for general use.
>> It has dependencies on newer versions of POI than 0.4, and some distinct
>> improvements to it's robustness.
>> G
>> 
>> On 6 October 2010 16:39, Tim Donohue <tdono...@duraspace.org> wrote:
>>> 
>>> Ugh -- sounds like you've entered dependency hell.
>>> 
>>> Though, I think the one shred of good news here is that it seems to only
>>> have a dependency conflict in one place in our codebase.
>>> 
>>> It looks like (at a glance) if our WordFilter can be re-written to no
>>> longer need the org.textmining project, you *might* be OK (i.e.
>>> hopefully it wouldn't snowball on you). But, that would require finding
>>> a Word document text extractor that is as good as (or better than) that
>>> 'org.textmining' one, and then hoping it doesn't cause another
>>> dependency conflict.  Not sure of any alternative Word text extractors,
>>> off the top of my head, but maybe others know of one?
>>> 
>>> - Tim
>> 
>> 
>> ------------------------------------------------------------------------------
>> Beautiful is writing same markup. Internet Explorer 9 supports
>> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
>> Spend less time writing and  rewriting code and more time creating great
>> experiences on the web. Be a part of the beta today.
>> http://p.sf.net/sfu/beautyoftheweb
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> 
>> 
> 
> 
> 
> -- 
> Mark R. Diggory
> Head of U.S. Operations - @mire
> 
> http://www.atmire.com - Institutional Repository Solutions
> http://www.togather.eu - Before getting together, get t...@ther

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to