Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-08 Thread Mark Diggory
If you push the poi version  to 3.6 in your maven configuration, do
you still get the error?

Mark


On Fri, Oct 8, 2010 at 9:47 AM, Keith Gilbertson
keith.gilbert...@library.gatech.edu wrote:
 Mark - Thank you.  It's in our maven repository.  Graham had mentioned there
 would be some work to get this going, but I didn't know what it involved.
 Everything built and installed with some minor code changes, which was very
 nifty.  I still got an error in the word filter.
 Hardy Pottinger had sent me a link to this notice:
 http://code.google.com/p/text-mining/issues/detail?id=5
 I didn't know what rejar meant, but I found this to work:
 1.  Get source for this version of text-mining utils with 'svn checkout
 http://text-mining.googlecode.com/svn/trunk/ text-mining-read-only' command
 2.  From this tree, delete lib/poi-3.0.1-FINAL-20070705.jar and replace with
 poi-3.6.jar
 3.  Rebuild with 'ant' command
 4.  Copy build/bin/tm-extractors-1.0.jar to
 lib/dspace-tm-extractors-1.0.0.jar directory of my dspace deployment
 directory
 Then filter-media works fine with the new PowerPoint filter and the
 WordFilter.
 So, could we rebuild the dspace-tm-extractors-1.00.jar against poi-3.6 and
 put that in our maven repository? I suppose now would also be a good
 opportunity for me to learn about the unit testing framework and use it to
 make sure filtering still works as well as it did before the change!
  Ryan Ackley, the developer for these tm-extractors also worked on the POI
 project for a while.   Presumably he's very busy, but I'll contact him and
 ask if POI now has the full capability of the tm-extractors and hope for an
 answer - because maybe we don't even need the tm-extractors library if the
 POI extractors were rewritten by Ryan.
 It looks like the current WordFilter doesn't handle the new Microsoft Word
 XML formats - so that may be another small project for someone to take on
 soon.

 --keith

 On Oct 7, 2010, at 3:35 AM, Mark Diggory wrote:

 As its not in the maven central repository.  We would need to release
 it ourselves under org.dspace.dependencies or see if someone else can
 push out a new version of tm-extractors for maven central.

 To release into our repository, we just need to author a pom.xml file
 for the tm-extractors and package the jar... I set this up, but had
 some issues with sonatype failing to let me see the staged release on
 their side. I did release to the central repository.  Still waiting to
 see it show up here:

 http://repo2.maven.org/maven2/org/dspace/dependencies/dspace-tm-extractors

 once available, give it a try and see if it fixes your issues.

 Mark

 On Wed, Oct 6, 2010 at 11:11 AM, Keith Gilbertson
 keith.gilbert...@library.gatech.edu wrote:

 Thanks Graham and Tim.  I hadn't seen that.

 On Oct 6, 2010, at 11:52 AM, Graham Triggs wrote:

 That version of tm-extractors is quite old.

 There is a newer version on the Google site

 - http://code.google.com/p/text-mining/ - but it will take a bit of work

 wrapping things up for general use.

 It has dependencies on newer versions of POI than 0.4, and some distinct

 improvements to it's robustness.

 G

 On 6 October 2010 16:39, Tim Donohue tdono...@duraspace.org wrote:

 Ugh -- sounds like you've entered dependency hell.

 Though, I think the one shred of good news here is that it seems to only

 have a dependency conflict in one place in our codebase.

 It looks like (at a glance) if our WordFilter can be re-written to no

 longer need the org.textmining project, you *might* be OK (i.e.

 hopefully it wouldn't snowball on you). But, that would require finding

 a Word document text extractor that is as good as (or better than) that

 'org.textmining' one, and then hoping it doesn't cause another

 dependency conflict.  Not sure of any alternative Word text extractors,

 off the top of my head, but maybe others know of one?

 - Tim


 --

 Beautiful is writing same markup. Internet Explorer 9 supports

 standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.

 Spend less time writing and  rewriting code and more time creating great

 experiences on the web. Be a part of the beta today.

 http://p.sf.net/sfu/beautyoftheweb

 ___

 DSpace-tech mailing list

 DSpace-tech@lists.sourceforge.net

 https://lists.sourceforge.net/lists/listinfo/dspace-tech





 --
 Mark R. Diggory
 Head of U.S. Operations - @mire

 http://www.atmire.com - Institutional Repository Solutions
 http://www.togather.eu - Before getting together, get t...@ther





-- 
Mark R. Diggory
Head of U.S. Operations - @mire

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  

Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-08 Thread Keith Gilbertson

On Oct 8, 2010, at 3:18 PM, Mark Diggory wrote:

 If you push the poi version  to 3.6 in your maven configuration, do
 you still get the error?
 
 Mark

Yes.  I'd made that change in the maven configuration, but I still had to 
rebuild the tm-extractors against poi-3.6.  Then that made the error go away.  

There's a document here:
http://java.sun.com/docs/books/jls/second_edition/html/binaryComp.doc.html

It has many details about binary compatibility.  Apparently the POI developers 
broke a recommendation buried in this document, and that requires tm-extractor 
libraries to actually be rebuilt with newer versions of POI in order to work 
with them.  It's not possible to build the tm-extractors with poi 3.0.1 and 
then use them successfully with poi version 3.6.

So, the maven configuration needs to be updated to 3.6 AND the 
dspace-tm-extractors need to be built against poi 3.6. 




--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-07 Thread Mark Diggory
As its not in the maven central repository.  We would need to release
it ourselves under org.dspace.dependencies or see if someone else can
push out a new version of tm-extractors for maven central.

To release into our repository, we just need to author a pom.xml file
for the tm-extractors and package the jar... I set this up, but had
some issues with sonatype failing to let me see the staged release on
their side. I did release to the central repository.  Still waiting to
see it show up here:

http://repo2.maven.org/maven2/org/dspace/dependencies/dspace-tm-extractors

once available, give it a try and see if it fixes your issues.

Mark

On Wed, Oct 6, 2010 at 11:11 AM, Keith Gilbertson
keith.gilbert...@library.gatech.edu wrote:
 Thanks Graham and Tim.  I hadn't seen that.

 On Oct 6, 2010, at 11:52 AM, Graham Triggs wrote:

 That version of tm-extractors is quite old.
 There is a newer version on the Google site
 - http://code.google.com/p/text-mining/ - but it will take a bit of work
 wrapping things up for general use.
 It has dependencies on newer versions of POI than 0.4, and some distinct
 improvements to it's robustness.
 G

 On 6 October 2010 16:39, Tim Donohue tdono...@duraspace.org wrote:

 Ugh -- sounds like you've entered dependency hell.

 Though, I think the one shred of good news here is that it seems to only
 have a dependency conflict in one place in our codebase.

 It looks like (at a glance) if our WordFilter can be re-written to no
 longer need the org.textmining project, you *might* be OK (i.e.
 hopefully it wouldn't snowball on you). But, that would require finding
 a Word document text extractor that is as good as (or better than) that
 'org.textmining' one, and then hoping it doesn't cause another
 dependency conflict.  Not sure of any alternative Word text extractors,
 off the top of my head, but maybe others know of one?

 - Tim


 --
 Beautiful is writing same markup. Internet Explorer 9 supports
 standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
 Spend less time writing and  rewriting code and more time creating great
 experiences on the web. Be a part of the beta today.
 http://p.sf.net/sfu/beautyoftheweb
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech





-- 
Mark R. Diggory
Head of U.S. Operations - @mire

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-06 Thread Keith Gilbertson
Thanks, Tim.  That helped me to understand.  I put the version numbers of the 
dependency in the parent pom.xml ('dspace-src/pom.xml') and left the version 
numbers out of 'dspace-src/dspace-api/pom.xml'.  

So then I found another thing I didn't look at closely enough.  The WordFilter 
doesn't use poi directly, but the org.textmining project that it uses depends 
on that old version of POI.  To confuse things more, the old versions of poi 
had groupId 'poi', and the new versions have groupId 'org.apache.poi'.
I can convince Maven to forget about the old version of the POIi library by 
making this exclusion change in the parent pom:
 dependency
groupIdorg.textmining/groupId
artifactIdtm-extractors/artifactId
version0.4/version
exclusions
   exclusion
  groupIdpoi/groupId
  artifactIdpoi/artifactId
   /exclusion
/exclusions
 /dependency
 dependency

Then only the new version, org.apache.poi/poi/3.6 is included in the project.  
Unfortunately, the org.textmining extractors really do need that version of 
POI.  The PowerPointFilter works, but I've broken the WordFilter:

Exception: 
org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
java.lang.NoSuchMethodError: 
org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
at 
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:51)
at 
org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:95)

I have two programs that share the same classpath, but need different versions 
of the same library.

I could rewrite the WordFilter so that it no longer uses the org.textmining 
package which needs the old library, but I keep thinking that the more I try to 
fix stuff, the more I'm likely to break:

http://www.nypost.com/p/news/local/brooklyn/rat_bastards_f5onjzgcqxm0fu3RFz3ySL



On Oct 5, 2010, at 4:09 PM, Tim Donohue wrote:

 Hi Keith,
 
 Simply put, it's because you were accidentally looking in the wrong pom.xml 
 :)  There's many of them sprinkled through the DSpace codebase, and they all 
 inherit many of their settings from one main pom.xml.
 
 So, you noticed that the 'dspace-api/pom.xml' file included a dependency for 
 poi.   But, if you look closely, that dependency doesn't list a version.  
 This is because, for DSpace, we manage all the versions of dependencies in 
 one parent pom.xml (which is loaded via the parent tag within the 
 dspace-aip/pom.xml).
 
 Now, take a look at the [dspace-src]/pom.xml. This is the main Parent pom.xml 
 for dspace (with an artifactid of 'dspace-parent')
 
 http://scm.dspace.org/svn/repo/dspace/trunk/pom.xml
 
 This is the pom.xml which actually lists the versions of every dependency 
 used by the various APIs of DSpace.  If you search in this pom.xml, you'll 
 find this entry:
 
 dependency
groupIdpoi/groupId
artifactIdpoi/artifactId
version2.5.1-final-20040804/version
 /dependency
 
 That's where the 2.5.1 version is sneaking in.  If you make your necessary 
 changes to this pom.xml, everything should act as you expect it to. So, just 
 undo your changes in 'dspace-src/dspace-api/pom.xml', and instead make those 
 changes to 'dspace-src/pom.xml'
 
 I hope that helps!
 
 - Tim
 
 On 10/5/2010 2:36 PM, Keith Gilbertson wrote:
 Hi,
 
 I've been experimenting with a Media Filter for text extraction from 
 PowerPoint files.  It's based on the Apache POI libraries, as was suggested 
 by others in a previous thread.
 
 It uses the poi, poi-scratchpad, and poi-ooxml artifacts, in version 3.6, 
 the latest release version from Apache.  I haven't done much with Maven, and 
 am not sure how to tell it which libraries I need.
 
 This bit was already in the dspace-api/pom.xml file:
dependency
 -groupIdpoi/groupId
 -artifactIdpoi/artifactId
 -/dependency
 
 
 I removed it, because I wanted the latest version of the libraries.  Then, I 
 added these dependencies to the bottom of the file:
 
 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi/artifactId
 +version3.6/version
 +/dependency
 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi-scratchpad/artifactId
 +version3.6/version
 +/dependency
 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi-ooxml/artifactId
 +version3.6/version
 +/dependency
 
 Somehow Maven magically found the correct versions of the dependencies, and 
 everything built fine.  When I deployed DSpace and looked in the lib 
 directory, there were two versions of the main poi library there:
 
 poi-2.5.1-final-20040804.jar
 poi-3.6.jar
 poi-ooxml-3.6.jar
 poi-ooxml-schemas-3.6.jar
 poi-scratchpad-3.6.jar
 
 I couldn't figure out why the poi-2.5.1 version was still there, or find 
 anything that actually used it.  So, in the interest of doing some quick 
 testing, I just deleted 

Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-06 Thread Tim Donohue
Ugh -- sounds like you've entered dependency hell.

Though, I think the one shred of good news here is that it seems to only 
have a dependency conflict in one place in our codebase.

It looks like (at a glance) if our WordFilter can be re-written to no 
longer need the org.textmining project, you *might* be OK (i.e. 
hopefully it wouldn't snowball on you). But, that would require finding 
a Word document text extractor that is as good as (or better than) that 
'org.textmining' one, and then hoping it doesn't cause another 
dependency conflict.  Not sure of any alternative Word text extractors, 
off the top of my head, but maybe others know of one?

- Tim


On 10/6/2010 5:51 AM, Keith Gilbertson wrote:
 Thanks, Tim.  That helped me to understand.  I put the version numbers of the 
 dependency in the parent pom.xml ('dspace-src/pom.xml') and left the version 
 numbers out of 'dspace-src/dspace-api/pom.xml'.

 So then I found another thing I didn't look at closely enough.  The 
 WordFilter doesn't use poi directly, but the org.textmining project that it 
 uses depends on that old version of POI.  To confuse things more, the old 
 versions of poi had groupId 'poi', and the new versions have groupId 
 'org.apache.poi'.
 I can convince Maven to forget about the old version of the POIi library by 
 making this exclusion change in the parent pom:
   dependency
  groupIdorg.textmining/groupId
  artifactIdtm-extractors/artifactId
  version0.4/version
  exclusions
 exclusion
groupIdpoi/groupId
artifactIdpoi/artifactId
 /exclusion
  /exclusions
   /dependency
   dependency

 Then only the new version, org.apache.poi/poi/3.6 is included in the project. 
  Unfortunately, the org.textmining extractors really do need that version of 
 POI.  The PowerPointFilter works, but I've broken the WordFilter:

 Exception: 
 org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
 java.lang.NoSuchMethodError: 
 org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
   at 
 org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:51)
   at 
 org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:95)

 I have two programs that share the same classpath, but need different 
 versions of the same library.

 I could rewrite the WordFilter so that it no longer uses the org.textmining 
 package which needs the old library, but I keep thinking that the more I try 
 to fix stuff, the more I'm likely to break:

 http://www.nypost.com/p/news/local/brooklyn/rat_bastards_f5onjzgcqxm0fu3RFz3ySL



 On Oct 5, 2010, at 4:09 PM, Tim Donohue wrote:

 Hi Keith,

 Simply put, it's because you were accidentally looking in the wrong pom.xml 
 :)  There's many of them sprinkled through the DSpace codebase, and they all 
 inherit many of their settings from one main pom.xml.

 So, you noticed that the 'dspace-api/pom.xml' file included a dependency for 
 poi.   But, if you look closely, that dependency doesn't list aversion.  
 This is because, for DSpace, we manage all the versions of dependencies in 
 one parent pom.xml (which is loaded via theparent  tag within the 
 dspace-aip/pom.xml).

 Now, take a look at the [dspace-src]/pom.xml. This is the main Parent 
 pom.xml for dspace (with an artifactid of 'dspace-parent')

 http://scm.dspace.org/svn/repo/dspace/trunk/pom.xml

 This is the pom.xml which actually lists the versions of every dependency 
 used by the various APIs of DSpace.  If you search in this pom.xml, you'll 
 find this entry:

 dependency
 groupIdpoi/groupId
 artifactIdpoi/artifactId
 version2.5.1-final-20040804/version
 /dependency

 That's where the 2.5.1 version is sneaking in.  If you make your necessary 
 changes to this pom.xml, everything should act as you expect it to. So, just 
 undo your changes in 'dspace-src/dspace-api/pom.xml', and instead make those 
 changes to 'dspace-src/pom.xml'

 I hope that helps!

 - Tim

 On 10/5/2010 2:36 PM, Keith Gilbertson wrote:
 Hi,

 I've been experimenting with a Media Filter for text extraction from 
 PowerPoint files.  It's based on the Apache POI libraries, as was suggested 
 by others in a previous thread.

 It uses the poi, poi-scratchpad, and poi-ooxml artifacts, in version 3.6, 
 the latest release version from Apache.  I haven't done much with Maven, 
 and am not sure how to tell it which libraries I need.

 This bit was already in the dspace-api/pom.xml file:
 dependency
 -groupIdpoi/groupId
 -artifactIdpoi/artifactId
 -/dependency


 I removed it, because I wanted the latest version of the libraries.  Then, 
 I added these dependencies to the bottom of the file:

 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi/artifactId
 +version3.6/version
 

Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-06 Thread Graham Triggs
That version of tm-extractors is quite old.

There is a newer version on the Google site -
http://code.google.com/p/text-mining/ - but it will take a bit of work
wrapping things up for general use.

It has dependencies on newer versions of POI than 0.4, and some distinct
improvements to it's robustness.

G

On 6 October 2010 16:39, Tim Donohue tdono...@duraspace.org wrote:

 Ugh -- sounds like you've entered dependency hell.

 Though, I think the one shred of good news here is that it seems to only
 have a dependency conflict in one place in our codebase.

 It looks like (at a glance) if our WordFilter can be re-written to no
 longer need the org.textmining project, you *might* be OK (i.e.
 hopefully it wouldn't snowball on you). But, that would require finding
 a Word document text extractor that is as good as (or better than) that
 'org.textmining' one, and then hoping it doesn't cause another
 dependency conflict.  Not sure of any alternative Word text extractors,
 off the top of my head, but maybe others know of one?

 - Tim


 On 10/6/2010 5:51 AM, Keith Gilbertson wrote:
  Thanks, Tim.  That helped me to understand.  I put the version numbers of
 the dependency in the parent pom.xml ('dspace-src/pom.xml') and left the
 version numbers out of 'dspace-src/dspace-api/pom.xml'.
 
  So then I found another thing I didn't look at closely enough.  The
 WordFilter doesn't use poi directly, but the org.textmining project that it
 uses depends on that old version of POI.  To confuse things more, the old
 versions of poi had groupId 'poi', and the new versions have groupId
 'org.apache.poi'.
  I can convince Maven to forget about the old version of the POIi library
 by making this exclusion change in the parent pom:
dependency
   groupIdorg.textmining/groupId
   artifactIdtm-extractors/artifactId
   version0.4/version
   exclusions
  exclusion
 groupIdpoi/groupId
 artifactIdpoi/artifactId
  /exclusion
   /exclusions
/dependency
dependency
 
  Then only the new version, org.apache.poi/poi/3.6 is included in the
 project.  Unfortunately, the org.textmining extractors really do need that
 version of POI.  The PowerPointFilter works, but I've broken the WordFilter:
 
  Exception:
 org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
  java.lang.NoSuchMethodError:
 org.apache.poi.poifs.filesystem.POIFSFileSystem.getRoot()Lorg/apache/poi/poifs/filesystem/DirectoryEntry;
at
 org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:51)
at
 org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:95)
 
  I have two programs that share the same classpath, but need different
 versions of the same library.
 
  I could rewrite the WordFilter so that it no longer uses the
 org.textmining package which needs the old library, but I keep thinking that
 the more I try to fix stuff, the more I'm likely to break:
 
 
 http://www.nypost.com/p/news/local/brooklyn/rat_bastards_f5onjzgcqxm0fu3RFz3ySL
 
 
 
  On Oct 5, 2010, at 4:09 PM, Tim Donohue wrote:
 
  Hi Keith,
 
  Simply put, it's because you were accidentally looking in the wrong
 pom.xml :)  There's many of them sprinkled through the DSpace codebase, and
 they all inherit many of their settings from one main pom.xml.
 
  So, you noticed that the 'dspace-api/pom.xml' file included a dependency
 for poi.   But, if you look closely, that dependency doesn't list
 aversion.  This is because, for DSpace, we manage all the versions of
 dependencies in one parent pom.xml (which is loaded via theparent  tag
 within the dspace-aip/pom.xml).
 
  Now, take a look at the [dspace-src]/pom.xml. This is the main Parent
 pom.xml for dspace (with an artifactid of 'dspace-parent')
 
  http://scm.dspace.org/svn/repo/dspace/trunk/pom.xml
 
  This is the pom.xml which actually lists the versions of every
 dependency used by the various APIs of DSpace.  If you search in this
 pom.xml, you'll find this entry:
 
  dependency
  groupIdpoi/groupId
  artifactIdpoi/artifactId
  version2.5.1-final-20040804/version
  /dependency
 
  That's where the 2.5.1 version is sneaking in.  If you make your
 necessary changes to this pom.xml, everything should act as you expect it
 to. So, just undo your changes in 'dspace-src/dspace-api/pom.xml', and
 instead make those changes to 'dspace-src/pom.xml'
 
  I hope that helps!
 
  - Tim
 
  On 10/5/2010 2:36 PM, Keith Gilbertson wrote:
  Hi,
 
  I've been experimenting with a Media Filter for text extraction from
 PowerPoint files.  It's based on the Apache POI libraries, as was suggested
 by others in a previous thread.
 
  It uses the poi, poi-scratchpad, and poi-ooxml artifacts, in version
 3.6, the latest release version from Apache.  I haven't done much with
 Maven, and am not sure 

Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-06 Thread Keith Gilbertson
Thanks Graham and Tim.  I hadn't seen that.


On Oct 6, 2010, at 11:52 AM, Graham Triggs wrote:

 That version of tm-extractors is quite old.
 
 There is a newer version on the Google site - 
 http://code.google.com/p/text-mining/ - but it will take a bit of work 
 wrapping things up for general use.
 
 It has dependencies on newer versions of POI than 0.4, and some distinct 
 improvements to it's robustness.
 
 G
 
 On 6 October 2010 16:39, Tim Donohue tdono...@duraspace.org wrote:
 Ugh -- sounds like you've entered dependency hell.
 
 Though, I think the one shred of good news here is that it seems to only
 have a dependency conflict in one place in our codebase.
 
 It looks like (at a glance) if our WordFilter can be re-written to no
 longer need the org.textmining project, you *might* be OK (i.e.
 hopefully it wouldn't snowball on you). But, that would require finding
 a Word document text extractor that is as good as (or better than) that
 'org.textmining' one, and then hoping it doesn't cause another
 dependency conflict.  Not sure of any alternative Word text extractors,
 off the top of my head, but maybe others know of one?
 
 - Tim

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-05 Thread Keith Gilbertson
Hi,

I've been experimenting with a Media Filter for text extraction from PowerPoint 
files.  It's based on the Apache POI libraries, as was suggested by others in a 
previous thread.  

It uses the poi, poi-scratchpad, and poi-ooxml artifacts, in version 3.6, the 
latest release version from Apache.  I haven't done much with Maven, and am not 
sure how to tell it which libraries I need.

This bit was already in the dspace-api/pom.xml file:
   dependency
- groupIdpoi/groupId
- artifactIdpoi/artifactId
-  /dependency


I removed it, because I wanted the latest version of the libraries.  Then, I 
added these dependencies to the bottom of the file:

+  dependency
+ groupIdorg.apache.poi/groupId
+ artifactIdpoi/artifactId
+ version3.6/version
+  /dependency
+  dependency
+ groupIdorg.apache.poi/groupId
+ artifactIdpoi-scratchpad/artifactId
+ version3.6/version
+  /dependency
+  dependency
+ groupIdorg.apache.poi/groupId
+ artifactIdpoi-ooxml/artifactId
+ version3.6/version
+  /dependency

Somehow Maven magically found the correct versions of the dependencies, and 
everything built fine.  When I deployed DSpace and looked in the lib directory, 
there were two versions of the main poi library there:

poi-2.5.1-final-20040804.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar

I couldn't figure out why the poi-2.5.1 version was still there, or find 
anything that actually used it.  So, in the interest of doing some quick 
testing, I just deleted it.

Can someone give a hand on how to do this properly?  I'm trying to tell the 
build process to find and use only version 3.6 of poi. 

Thank you!
--keith



--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace/Maven help request - update dependency version

2010-10-05 Thread Tim Donohue
Hi Keith,

Simply put, it's because you were accidentally looking in the wrong 
pom.xml :)  There's many of them sprinkled through the DSpace codebase, 
and they all inherit many of their settings from one main pom.xml.

So, you noticed that the 'dspace-api/pom.xml' file included a dependency 
for poi.   But, if you look closely, that dependency doesn't list a 
version.  This is because, for DSpace, we manage all the versions of 
dependencies in one parent pom.xml (which is loaded via the parent tag 
within the dspace-aip/pom.xml).

Now, take a look at the [dspace-src]/pom.xml. This is the main Parent 
pom.xml for dspace (with an artifactid of 'dspace-parent')

http://scm.dspace.org/svn/repo/dspace/trunk/pom.xml

This is the pom.xml which actually lists the versions of every 
dependency used by the various APIs of DSpace.  If you search in this 
pom.xml, you'll find this entry:

dependency
 groupIdpoi/groupId
 artifactIdpoi/artifactId
 version2.5.1-final-20040804/version
/dependency

That's where the 2.5.1 version is sneaking in.  If you make your 
necessary changes to this pom.xml, everything should act as you expect 
it to. So, just undo your changes in 'dspace-src/dspace-api/pom.xml', 
and instead make those changes to 'dspace-src/pom.xml'

I hope that helps!

- Tim

On 10/5/2010 2:36 PM, Keith Gilbertson wrote:
 Hi,

 I've been experimenting with a Media Filter for text extraction from 
 PowerPoint files.  It's based on the Apache POI libraries, as was suggested 
 by others in a previous thread.

 It uses the poi, poi-scratchpad, and poi-ooxml artifacts, in version 3.6, the 
 latest release version from Apache.  I haven't done much with Maven, and am 
 not sure how to tell it which libraries I need.

 This bit was already in the dspace-api/pom.xml file:
 dependency
 -groupIdpoi/groupId
 -artifactIdpoi/artifactId
 -/dependency


 I removed it, because I wanted the latest version of the libraries.  Then, I 
 added these dependencies to the bottom of the file:

 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi/artifactId
 +version3.6/version
 +/dependency
 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi-scratchpad/artifactId
 +version3.6/version
 +/dependency
 +dependency
 +groupIdorg.apache.poi/groupId
 +artifactIdpoi-ooxml/artifactId
 +version3.6/version
 +/dependency

 Somehow Maven magically found the correct versions of the dependencies, and 
 everything built fine.  When I deployed DSpace and looked in the lib 
 directory, there were two versions of the main poi library there:

 poi-2.5.1-final-20040804.jar
 poi-3.6.jar
 poi-ooxml-3.6.jar
 poi-ooxml-schemas-3.6.jar
 poi-scratchpad-3.6.jar

 I couldn't figure out why the poi-2.5.1 version was still there, or find 
 anything that actually used it.  So, in the interest of doing some quick 
 testing, I just deleted it.

 Can someone give a hand on how to do this properly?  I'm trying to tell the 
 build process to find and use only version 3.6 of poi.

 Thank you!
 --keith



 --
 Beautiful is writing same markup. Internet Explorer 9 supports
 standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
 Spend less time writing and  rewriting code and more time creating great
 experiences on the web. Be a part of the beta today.
 http://p.sf.net/sfu/beautyoftheweb
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech